public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Improve surplus TLS accounting
@ 2020-06-22 16:20 Szabolcs Nagy
  2020-06-22 16:21 ` [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces Szabolcs Nagy
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Szabolcs Nagy @ 2020-06-22 16:20 UTC (permalink / raw)
  To: libc-alpha

Addressed the review comments, the tricky static TLS accounting
bits are unchanged. These are still outstanding:

> Subsequent followup after committing this:
> - We need to fix tst-manyaudit.
> - We should be able to count how many spaces we need based on LD_AUDIT
>   or DT_AUDIT and enable up to that amount.

Reran the tests on aarch64 and x86.

Szabolcs Nagy (2):
  rtld: Add rtld.nns tunable for the number of supported namespaces
  rtld: Avoid using up static TLS surplus for optimizations [BZ #25051]

 csu/libc-tls.c             |  31 +++++-----
 elf/Makefile               |  29 +++++++++-
 elf/dl-reloc.c             |  37 +++++++++---
 elf/dl-tls.c               |  56 ++++++++++++++++--
 elf/dl-tunables.list       |  14 +++++
 elf/dynamic-link.h         |   5 +-
 elf/rtld.c                 |   3 +
 elf/tst-tls-ie-dlmopen.c   | 114 +++++++++++++++++++++++++++++++++++++
 elf/tst-tls-ie-mod.h       |  40 +++++++++++++
 elf/tst-tls-ie-mod0.c      |   4 ++
 elf/tst-tls-ie-mod1.c      |   4 ++
 elf/tst-tls-ie-mod2.c      |   4 ++
 elf/tst-tls-ie-mod3.c      |   4 ++
 elf/tst-tls-ie-mod4.c      |   4 ++
 elf/tst-tls-ie-mod5.c      |   4 ++
 elf/tst-tls-ie-mod6.c      |   4 ++
 elf/tst-tls-ie.c           | 113 ++++++++++++++++++++++++++++++++++++
 manual/tunables.texi       |  38 +++++++++++++
 sysdeps/generic/ldsodefs.h |  11 ++++
 19 files changed, 487 insertions(+), 32 deletions(-)
 create mode 100644 elf/tst-tls-ie-dlmopen.c
 create mode 100644 elf/tst-tls-ie-mod.h
 create mode 100644 elf/tst-tls-ie-mod0.c
 create mode 100644 elf/tst-tls-ie-mod1.c
 create mode 100644 elf/tst-tls-ie-mod2.c
 create mode 100644 elf/tst-tls-ie-mod3.c
 create mode 100644 elf/tst-tls-ie-mod4.c
 create mode 100644 elf/tst-tls-ie-mod5.c
 create mode 100644 elf/tst-tls-ie-mod6.c
 create mode 100644 elf/tst-tls-ie.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces
  2020-06-22 16:20 [PATCH v5 0/2] Improve surplus TLS accounting Szabolcs Nagy
@ 2020-06-22 16:21 ` Szabolcs Nagy
  2020-07-06 13:13   ` Carlos O'Donell
  2020-06-22 16:21 ` [PATCH v5 2/2] rtld: Avoid using up static TLS surplus for optimizations [BZ #25051] Szabolcs Nagy
  2020-06-26 10:50 ` [PATCH v5 0/2] Improve surplus TLS accounting Szabolcs Nagy
  2 siblings, 1 reply; 12+ messages in thread
From: Szabolcs Nagy @ 2020-06-22 16:21 UTC (permalink / raw)
  To: libc-alpha

TLS_STATIC_SURPLUS is 1664 bytes currently which is not enough to
support DL_NNS (== 16) number of dynamic link namespaces, if we
assume 192 bytes of TLS are reserved for libc use and 144 bytes
are reserved for other system libraries that use IE TLS.

A new tunable is introduced to control the number of supported
namespaces and to adjust the surplus static TLS size as follows:

surplus_tls = 192 * (rtld.nns-1) + 144 * rtld.nns + 512

The default is rtld.nns == 4 and then the surplus TLS size is the
same as before, so the behaviour is unchanged by default. If an
application creates more namespaces than the rtld.nns setting
allows, then it is not guaranteed to work, but the limit is not
checked. So existing usage will continue to work, but in the
future if an application creates more than 4 dynamic link
namespaces then the tunable will need to be set.

In this patch DL_NNS is a fixed value and provides a maximum to
the rtld.nns setting.

Static linking used fixed 2048 bytes surplus TLS, this is changed
so the same contract is used as for dynamic linking.  With static
linking DL_NNS == 1 so rtld.nns tunable is forced to 1, so by
default the surplus TLS is reduced to 144 + 512 = 656 bytes. This
change is not expected to cause problems.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

---

v5:
- Split the patch into two: rtld.nns and tls optimization parts.
- Use rtld.nns instead of dl.nns.
- Renamed the init function and moved the calls close to
  the first use of surplus tls size.
- Updated the tunable documentation.
v4:
- Rebased and moved this log out of the commit message.
- Minor commit message wording changes.
v3:
- archived at
  https://sourceware.org/pipermail/libc-alpha/2020-March/111660.html
- Replace TLS_STATIC_SURPLUS with GLRO(dl_tls_static_surplus) and
  simplify related logic.
- In case of static linking, replace GL(dl_tls_static_size) with
  GLRO(dl_tls_static_surplus) in the code paths before the
  GL(dl_tls_static_size) value is actually computed.
- Update comments and the test code.
- Document the new tunables.
- Update description, mention static linking.
v2:
- Add dl.nns tunable.
- Add dl.optional_static_tls tunable.
- New surplus TLS usage contract that works reliably up to dl.nns
  namespaces.
---
 csu/libc-tls.c             | 28 +++++++++----------
 elf/dl-tls.c               | 55 ++++++++++++++++++++++++++++++++++----
 elf/dl-tunables.list       |  9 +++++++
 elf/rtld.c                 |  3 +++
 manual/tunables.texi       | 21 +++++++++++++++
 sysdeps/generic/ldsodefs.h |  8 ++++++
 6 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/csu/libc-tls.c b/csu/libc-tls.c
index 73ade0fec5..e2603157e8 100644
--- a/csu/libc-tls.c
+++ b/csu/libc-tls.c
@@ -46,13 +46,16 @@ bool _dl_tls_dtv_gaps;
 struct dtv_slotinfo_list *_dl_tls_dtv_slotinfo_list;
 /* Number of modules in the static TLS block.  */
 size_t _dl_tls_static_nelem;
-/* Size of the static TLS block.  Giving this initialized value
-   preallocates some surplus bytes in the static TLS area.  */
-size_t _dl_tls_static_size = 2048;
+/* Size of the static TLS block.  */
+size_t _dl_tls_static_size;
 /* Size actually allocated in the static TLS block.  */
 size_t _dl_tls_static_used;
 /* Alignment requirement of the static TLS block.  */
 size_t _dl_tls_static_align;
+/* Size of surplus space in the static TLS area for dynamically
+   loaded modules with IE-model TLS or for TLSDESC optimization.
+   See comments in elf/dl-tls.c where it is initialized.  */
+size_t _dl_tls_static_surplus;
 
 /* Generation counter for the dtv.  */
 size_t _dl_tls_generation;
@@ -81,10 +84,8 @@ init_slotinfo (void)
 static void
 init_static_tls (size_t memsz, size_t align)
 {
-  /* That is the size of the TLS memory for this object.  The initialized
-     value of _dl_tls_static_size is provided by dl-open.c to request some
-     surplus that permits dynamic loading of modules with IE-model TLS.  */
-  GL(dl_tls_static_size) = roundup (memsz + GL(dl_tls_static_size),
+  /* That is the size of the TLS memory for this object.  */
+  GL(dl_tls_static_size) = roundup (memsz + GLRO(dl_tls_static_surplus),
 				    TLS_TCB_ALIGN);
 #if TLS_TCB_AT_TP
   GL(dl_tls_static_size) += TLS_TCB_SIZE;
@@ -125,25 +126,24 @@ __libc_setup_tls (void)
 	  break;
 	}
 
+  /* Calculate the size of the static TLS surplus.  */
+  _dl_tls_static_surplus_init ();
+
   /* We have to set up the TCB block which also (possibly) contains
      'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
      Instead we use 'sbrk' which would only uses 'errno' if it fails.
      In this case we are right away out of memory and the user gets
-     what she/he deserves.
-
-     The initialized value of _dl_tls_static_size is provided by dl-open.c
-     to request some surplus that permits dynamic loading of modules with
-     IE-model TLS.  */
+     what she/he deserves.  */
 #if TLS_TCB_AT_TP
   /* Align the TCB offset to the maximum alignment, as
      _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
      and dl_tls_static_align.  */
-  tcb_offset = roundup (memsz + GL(dl_tls_static_size), max_align);
+  tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align);
   tlsblock = __sbrk (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
 #elif TLS_DTV_AT_TP
   tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
   tlsblock = __sbrk (tcb_offset + memsz + max_align
-		     + TLS_PRE_TCB_SIZE + GL(dl_tls_static_size));
+		     + TLS_PRE_TCB_SIZE + GLRO(dl_tls_static_surplus));
   tlsblock += TLS_PRE_TCB_SIZE;
 #else
   /* In case a model with a different layout for the TCB and DTV
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index fa03234610..2201a1cc1d 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -29,10 +29,54 @@
 #include <dl-tls.h>
 #include <ldsodefs.h>
 
-/* Amount of excess space to allocate in the static TLS area
-   to allow dynamic loading of modules defining IE-model TLS data.  */
-#define TLS_STATIC_SURPLUS	64 + DL_NNS * 100
+#define TUNABLE_NAMESPACE rtld
+#include <dl-tunables.h>
+
+/* Surplus static TLS, GLRO(dl_tls_static_surplus), is used for
+
+   - IE TLS in libc.so for all dlmopen namespaces except in the initial
+     one where libc.so is not loaded dynamically but at startup time,
+   - IE TLS in other libraries which may be dynamically loaded even in the
+     initial namespace,
+   - and optionally for optimizing dynamic TLS access.
+
+   The maximum number of namespaces is DL_NNS, but to support that many
+   namespaces correctly the static TLS allocation should be significantly
+   increased, which may cause problems with small thread stacks due to the
+   way static TLS is accounted (bug 11787).
+
+   So there is a rtld.nns tunable limit on the number of supported namespaces
+   that affects the size of the static TLS and by default it's small enough
+   not to cause problems with existing applications. The limit is not
+   enforced or checked: it is the user's responsibility to increase rtld.nns
+   if more dlmopen namespaces are used.  */
+
+/* Size of initial-exec TLS in libc.so.  */
+#define LIBC_IE_TLS 192
+/* Size of initial-exec TLS in libraries other than libc.so.
+   This should be large enough to cover runtime libraries of the
+   compiler such as libgomp and libraries in libc other than libc.so.  */
+#define OTHER_IE_TLS 144
+/* Size of additional surplus TLS, placeholder for TLS optimizations.  */
+#define OPT_SURPLUS_TLS 512
 
+void
+_dl_tls_static_surplus_init (void)
+{
+  size_t nns;
+
+#if HAVE_TUNABLES
+  nns = TUNABLE_GET (nns, size_t, NULL);
+#else
+  /* Default values of the tunables.  */
+  nns = 4;
+#endif
+  if (nns > DL_NNS)
+    nns = DL_NNS;
+  GLRO(dl_tls_static_surplus) = ((nns - 1) * LIBC_IE_TLS
+				 + nns * OTHER_IE_TLS
+				 + OPT_SURPLUS_TLS);
+}
 
 /* Out-of-memory handler.  */
 static void
@@ -218,7 +262,8 @@ _dl_determine_tlsoffset (void)
     }
 
   GL(dl_tls_static_used) = offset;
-  GL(dl_tls_static_size) = (roundup (offset + TLS_STATIC_SURPLUS, max_align)
+  GL(dl_tls_static_size) = (roundup (offset + GLRO(dl_tls_static_surplus),
+				     max_align)
 			    + TLS_TCB_SIZE);
 #elif TLS_DTV_AT_TP
   /* The TLS blocks start right after the TCB.  */
@@ -262,7 +307,7 @@ _dl_determine_tlsoffset (void)
     }
 
   GL(dl_tls_static_used) = offset;
-  GL(dl_tls_static_size) = roundup (offset + TLS_STATIC_SURPLUS,
+  GL(dl_tls_static_size) = roundup (offset + GLRO(dl_tls_static_surplus),
 				    TLS_TCB_ALIGN);
 #else
 # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 0d398dd251..b07742d7b3 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -126,4 +126,13 @@ glibc {
       default: 3
     }
   }
+
+  rtld {
+    nns {
+      type: SIZE_T
+      minval: 1
+      maxval: 16
+      default: 4
+    }
+  }
 }
diff --git a/elf/rtld.c b/elf/rtld.c
index f4c2602d65..f339f6894f 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -780,6 +780,9 @@ init_tls (void)
       }
   assert (i == GL(dl_tls_max_dtv_idx));
 
+  /* Calculate the size of the static TLS surplus.  */
+  _dl_tls_static_surplus_init ();
+
   /* Compute the TLS offsets for the various blocks.  */
   _dl_determine_tlsoffset ();
 
diff --git a/manual/tunables.texi b/manual/tunables.texi
index ec18b10834..978e08f4fb 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -31,6 +31,7 @@ their own namespace.
 @menu
 * Tunable names::  The structure of a tunable name
 * Memory Allocation Tunables::  Tunables in the memory allocation subsystem
+* Dynamic Linking Tunables:: Tunables in the dynamic linking subsystem
 * Elision Tunables::  Tunables in elision subsystem
 * POSIX Thread Tunables:: Tunables in the POSIX thread subsystem
 * Hardware Capability Tunables::  Tunables that modify the hardware
@@ -226,6 +227,26 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
 passed to @code{malloc} for the largest bin size to enable.
 @end deftp
 
+@node Dynamic Linking Tunables
+@section Dynamic Linking Tunables
+@cindex dynamic linking tunables
+@cindex rtld tunables
+
+@deftp {Tunable namespace} glibc.rtld
+Dynamic linker behavior can be modified by setting the
+following tunables in the @code{rtld} namespace:
+@end deftp
+
+@deftp Tunable glibc.rtld.nns
+Sets the number of supported dynamic link namespaces (see @code{dlmopen}).
+Currently this limit can be set between 1 and 16 inclusive, the default is 4.
+Each link namespace consumes some memory in all thread, and thus raising the
+limit will increase the amount of memory each thread uses. Raising the limit
+is useful when your application uses more than 4 dynamic linker audit modules
+e.g. LD_AUDIT, or will use more than 4 dynamic link namespaces as created
+by @code{dlmopen} with an lmid argument of @code{LM_ID_NEWLM}.
+@end deftp
+
 @node Elision Tunables
 @section Elision Tunables
 @cindex elision tunables
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index c525ffa12c..3b0c6d9620 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -583,6 +583,11 @@ struct rtld_global_ro
      binaries, don't honor for PIEs).  */
   EXTERN ElfW(Addr) _dl_use_load_bias;
 
+  /* Size of surplus space in the static TLS area for dynamically
+     loaded modules with IE-model TLS or for TLSDESC optimization.
+     See comments in elf/dl-tls.c where it is initialized.  */
+  EXTERN size_t _dl_tls_static_surplus;
+
   /* Name of the shared object to be profiled (if any).  */
   EXTERN const char *_dl_profile;
   /* Filename of the output file.  */
@@ -1101,6 +1106,9 @@ extern size_t _dl_count_modids (void) attribute_hidden;
 /* Calculate offset of the TLS blocks in the static TLS block.  */
 extern void _dl_determine_tlsoffset (void) attribute_hidden;
 
+/* Calculate the size of the static TLS surplus.  */
+void _dl_tls_static_surplus_init (void) attribute_hidden;
+
 #ifndef SHARED
 /* Set up the TCB for statically linked applications.  This is called
    early during startup because we always use TLS (for errno and the
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 2/2] rtld: Avoid using up static TLS surplus for optimizations [BZ #25051]
  2020-06-22 16:20 [PATCH v5 0/2] Improve surplus TLS accounting Szabolcs Nagy
  2020-06-22 16:21 ` [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces Szabolcs Nagy
@ 2020-06-22 16:21 ` Szabolcs Nagy
  2020-07-06 13:19   ` Carlos O'Donell
  2020-06-26 10:50 ` [PATCH v5 0/2] Improve surplus TLS accounting Szabolcs Nagy
  2 siblings, 1 reply; 12+ messages in thread
From: Szabolcs Nagy @ 2020-06-22 16:21 UTC (permalink / raw)
  To: libc-alpha

On some targets static TLS surplus area can be used opportunistically
for dynamically loaded modules such that the TLS access then becomes
faster (TLSDESC and powerpc TLS optimization). However we don't want
all surplus TLS to be used for this optimization because dynamically
loaded modules with initial-exec model TLS can only use surplus TLS.

The new contract for surplus static TLS use is:

- libc.so can have up to 192 bytes of IE TLS,
- other system libraries together can have up to 144 bytes of IE TLS.
- Some "optional" static TLS is available for opportunistic use.

The optional TLS is now tunable: rtld.optional_static_tls, so users
can directly affect the allocated static TLS size. (Note that module
unloading with dlclose does not reclaim static TLS. After the optional
TLS runs out, TLS access is no longer optimized to use static TLS.)

The default setting of rtld.optional_static_tls is 512 so the surplus
TLS is 3*192 + 4*144 + 512 = 1664 by default, the same as before.

Fixes BZ #25051.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

---

v5:
- Split the patch into two: rtld.nns and tls optimization parts.
- Use rtld tunable namespace.
- Set tunable min val to 0.
- New test using dlmopen.
- Tests now have check for failing IE TLS dlopen.
- Updated the tunable documentation.
v4:
- Rebased and moved this log out of the commit message.
- Minor commit message wording changes.
v3:
- archived at
  https://sourceware.org/pipermail/libc-alpha/2020-March/111660.html
- Replace TLS_STATIC_SURPLUS with GLRO(dl_tls_static_surplus) and
  simplify related logic.
- In case of static linking, replace GL(dl_tls_static_size) with
  GLRO(dl_tls_static_surplus) in the code paths before the
  GL(dl_tls_static_size) value is actually computed.
- Update comments and the test code.
- Document the new tunables.
- Update description, mention static linking.
v2:
- Add dl.nns tunable.
- Add dl.optional_static_tls tunable.
- New surplus TLS usage contract that works reliably up to dl.nns
  namespaces.
---
 csu/libc-tls.c             |   3 +
 elf/Makefile               |  29 +++++++++-
 elf/dl-reloc.c             |  37 +++++++++---
 elf/dl-tls.c               |   9 +--
 elf/dl-tunables.list       |   5 ++
 elf/dynamic-link.h         |   5 +-
 elf/tst-tls-ie-dlmopen.c   | 114 +++++++++++++++++++++++++++++++++++++
 elf/tst-tls-ie-mod.h       |  40 +++++++++++++
 elf/tst-tls-ie-mod0.c      |   4 ++
 elf/tst-tls-ie-mod1.c      |   4 ++
 elf/tst-tls-ie-mod2.c      |   4 ++
 elf/tst-tls-ie-mod3.c      |   4 ++
 elf/tst-tls-ie-mod4.c      |   4 ++
 elf/tst-tls-ie-mod5.c      |   4 ++
 elf/tst-tls-ie-mod6.c      |   4 ++
 elf/tst-tls-ie.c           | 113 ++++++++++++++++++++++++++++++++++++
 manual/tunables.texi       |  17 ++++++
 sysdeps/generic/ldsodefs.h |   3 +
 18 files changed, 386 insertions(+), 17 deletions(-)
 create mode 100644 elf/tst-tls-ie-dlmopen.c
 create mode 100644 elf/tst-tls-ie-mod.h
 create mode 100644 elf/tst-tls-ie-mod0.c
 create mode 100644 elf/tst-tls-ie-mod1.c
 create mode 100644 elf/tst-tls-ie-mod2.c
 create mode 100644 elf/tst-tls-ie-mod3.c
 create mode 100644 elf/tst-tls-ie-mod4.c
 create mode 100644 elf/tst-tls-ie-mod5.c
 create mode 100644 elf/tst-tls-ie-mod6.c
 create mode 100644 elf/tst-tls-ie.c

diff --git a/csu/libc-tls.c b/csu/libc-tls.c
index e2603157e8..fb77cd94fa 100644
--- a/csu/libc-tls.c
+++ b/csu/libc-tls.c
@@ -56,6 +56,9 @@ size_t _dl_tls_static_align;
    loaded modules with IE-model TLS or for TLSDESC optimization.
    See comments in elf/dl-tls.c where it is initialized.  */
 size_t _dl_tls_static_surplus;
+/* Remaining amount of static TLS that may be used for optimizing
+   dynamic TLS access (e.g. with TLSDESC).  */
+size_t _dl_tls_static_optional;
 
 /* Generation counter for the dtv.  */
 size_t _dl_tls_generation;
diff --git a/elf/Makefile b/elf/Makefile
index 6fe1df90bb..5fadaec27c 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -204,7 +204,8 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-dlopen-self tst-auditmany tst-initfinilazyfail tst-dlopenfail \
 	 tst-dlopenfail-2 \
 	 tst-filterobj tst-filterobj-dlopen tst-auxobj tst-auxobj-dlopen \
-	 tst-audit14 tst-audit15 tst-audit16
+	 tst-audit14 tst-audit15 tst-audit16 \
+	 tst-tls-ie tst-tls-ie-dlmopen
 #	 reldep9
 tests-internal += loadtest unload unload2 circleload1 \
 	 neededtest neededtest2 neededtest3 neededtest4 \
@@ -317,7 +318,11 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
 		tst-dlopenfailmod1 tst-dlopenfaillinkmod tst-dlopenfailmod2 \
 		tst-dlopenfailmod3 tst-ldconfig-ld-mod \
 		tst-filterobj-flt tst-filterobj-aux tst-filterobj-filtee \
-		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3
+		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3 \
+		tst-tls-ie-mod0 tst-tls-ie-mod1 tst-tls-ie-mod2 \
+		tst-tls-ie-mod3 tst-tls-ie-mod4 tst-tls-ie-mod5 \
+		tst-tls-ie-mod6
+
 # Most modules build with _ISOMAC defined, but those filtered out
 # depend on internal headers.
 modules-names-tests = $(filter-out ifuncmod% tst-libc_dlvsym-dso tst-tlsmod%,\
@@ -1748,3 +1753,23 @@ $(objpfx)tst-auxobj: $(objpfx)tst-filterobj-aux.so
 $(objpfx)tst-auxobj-dlopen: $(libdl)
 $(objpfx)tst-auxobj.out: $(objpfx)tst-filterobj-filtee.so
 $(objpfx)tst-auxobj-dlopen.out: $(objpfx)tst-filterobj-filtee.so
+
+$(objpfx)tst-tls-ie: $(libdl) $(shared-thread-library)
+$(objpfx)tst-tls-ie.out: \
+  $(objpfx)tst-tls-ie-mod0.so \
+  $(objpfx)tst-tls-ie-mod1.so \
+  $(objpfx)tst-tls-ie-mod2.so \
+  $(objpfx)tst-tls-ie-mod3.so \
+  $(objpfx)tst-tls-ie-mod4.so \
+  $(objpfx)tst-tls-ie-mod5.so \
+  $(objpfx)tst-tls-ie-mod6.so
+
+$(objpfx)tst-tls-ie-dlmopen: $(libdl) $(shared-thread-library)
+$(objpfx)tst-tls-ie-dlmopen.out: \
+  $(objpfx)tst-tls-ie-mod0.so \
+  $(objpfx)tst-tls-ie-mod1.so \
+  $(objpfx)tst-tls-ie-mod2.so \
+  $(objpfx)tst-tls-ie-mod3.so \
+  $(objpfx)tst-tls-ie-mod4.so \
+  $(objpfx)tst-tls-ie-mod5.so \
+  $(objpfx)tst-tls-ie-mod6.so
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index ffcc84d396..6d32e49467 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -39,13 +39,16 @@
 /* We are trying to perform a static TLS relocation in MAP, but it was
    dynamically loaded.  This can only work if there is enough surplus in
    the static TLS area already allocated for each running thread.  If this
-   object's TLS segment is too big to fit, we fail.  If it fits,
-   we set MAP->l_tls_offset and return.
-   This function intentionally does not return any value but signals error
-   directly, as static TLS should be rare and code handling it should
-   not be inlined as much as possible.  */
+   object's TLS segment is too big to fit, we fail with -1.  If it fits,
+   we set MAP->l_tls_offset and return 0.
+   A portion of the surplus static TLS can be optionally used to optimize
+   dynamic TLS access (with TLSDESC or powerpc TLS optimizations).
+   If OPTIONAL is true then TLS is allocated for such optimization and
+   the caller must have a fallback in case the optional portion of surplus
+   TLS runs out.  If OPTIONAL is false then the entire surplus TLS area is
+   considered and the allocation only fails if that runs out.  */
 int
-_dl_try_allocate_static_tls (struct link_map *map)
+_dl_try_allocate_static_tls (struct link_map *map, bool optional)
 {
   /* If we've already used the variable with dynamic access, or if the
      alignment requirements are too high, fail.  */
@@ -68,8 +71,14 @@ _dl_try_allocate_static_tls (struct link_map *map)
 
   size_t n = (freebytes - blsize) / map->l_tls_align;
 
-  size_t offset = GL(dl_tls_static_used) + (freebytes - n * map->l_tls_align
-					    - map->l_tls_firstbyte_offset);
+  /* Account optional static TLS surplus usage.  */
+  size_t use = freebytes - n * map->l_tls_align - map->l_tls_firstbyte_offset;
+  if (optional && use > GL(dl_tls_static_optional))
+    goto fail;
+  else if (optional)
+    GL(dl_tls_static_optional) -= use;
+
+  size_t offset = GL(dl_tls_static_used) + use;
 
   map->l_tls_offset = GL(dl_tls_static_used) = offset;
 #elif TLS_DTV_AT_TP
@@ -83,6 +92,13 @@ _dl_try_allocate_static_tls (struct link_map *map)
   if (used > GL(dl_tls_static_size))
     goto fail;
 
+  /* Account optional static TLS surplus usage.  */
+  size_t use = used - GL(dl_tls_static_used);
+  if (optional && use > GL(dl_tls_static_optional))
+    goto fail;
+  else if (optional)
+    GL(dl_tls_static_optional) -= use;
+
   map->l_tls_offset = offset;
   map->l_tls_firstbyte_offset = GL(dl_tls_static_used);
   GL(dl_tls_static_used) = used;
@@ -110,12 +126,15 @@ _dl_try_allocate_static_tls (struct link_map *map)
   return 0;
 }
 
+/* This function intentionally does not return any value but signals error
+   directly, as static TLS should be rare and code handling it should
+   not be inlined as much as possible.  */
 void
 __attribute_noinline__
 _dl_allocate_static_tls (struct link_map *map)
 {
   if (map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET
-      || _dl_try_allocate_static_tls (map))
+      || _dl_try_allocate_static_tls (map, false))
     {
       _dl_signal_error (0, map->l_name, NULL, N_("\
 cannot allocate memory in static TLS block"));
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 2201a1cc1d..af5db12d08 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -57,25 +57,26 @@
    This should be large enough to cover runtime libraries of the
    compiler such as libgomp and libraries in libc other than libc.so.  */
 #define OTHER_IE_TLS 144
-/* Size of additional surplus TLS, placeholder for TLS optimizations.  */
-#define OPT_SURPLUS_TLS 512
 
 void
 _dl_tls_static_surplus_init (void)
 {
-  size_t nns;
+  size_t nns, opt_tls;
 
 #if HAVE_TUNABLES
   nns = TUNABLE_GET (nns, size_t, NULL);
+  opt_tls = TUNABLE_GET (optional_static_tls, size_t, NULL);
 #else
   /* Default values of the tunables.  */
   nns = 4;
+  opt_tls = 512;
 #endif
   if (nns > DL_NNS)
     nns = DL_NNS;
+  GL(dl_tls_static_optional) = opt_tls;
   GLRO(dl_tls_static_surplus) = ((nns - 1) * LIBC_IE_TLS
 				 + nns * OTHER_IE_TLS
-				 + OPT_SURPLUS_TLS);
+				 + opt_tls);
 }
 
 /* Out-of-memory handler.  */
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index b07742d7b3..35634ef24d 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -134,5 +134,10 @@ glibc {
       maxval: 16
       default: 4
     }
+    optional_static_tls {
+      type: SIZE_T
+      minval: 0
+      default: 512
+    }
   }
 }
diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
index bb7a66f4cd..6727233e1a 100644
--- a/elf/dynamic-link.h
+++ b/elf/dynamic-link.h
@@ -40,9 +40,10 @@
     (__builtin_expect ((sym_map)->l_tls_offset				\
 		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
      && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
-	 || _dl_try_allocate_static_tls (sym_map) == 0))
+	 || _dl_try_allocate_static_tls (sym_map, true) == 0))
 
-int _dl_try_allocate_static_tls (struct link_map *map) attribute_hidden;
+int _dl_try_allocate_static_tls (struct link_map *map, bool optional)
+  attribute_hidden;
 
 #include <elf.h>
 
diff --git a/elf/tst-tls-ie-dlmopen.c b/elf/tst-tls-ie-dlmopen.c
new file mode 100644
index 0000000000..0be47c7237
--- /dev/null
+++ b/elf/tst-tls-ie-dlmopen.c
@@ -0,0 +1,114 @@
+/* Test dlopen of modules with initial-exec TLS after dlmopen.
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This test tries to check that surplus static TLS is not used up for
+   dynamic TLS optimizations and 4*144 = 576 bytes of static TLS is
+   still available for dlopening modules with initial-exec TLS after 3
+   new dlmopen namespaces are created.  It depends on rtld.nns=4 and
+   rtld.optional_static_tls=512 tunable settings.  */
+
+#include <errno.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+static int do_test (void);
+#include <support/xthread.h>
+#include <support/xdlfcn.h>
+#include <support/test-driver.c>
+
+/* Have some big TLS in the main exe: should not use surplus TLS.  */
+__thread char maintls[1000];
+
+static pthread_barrier_t barrier;
+
+/* Forces multi-threaded behaviour.  */
+static void *
+blocked_thread_func (void *closure)
+{
+  xpthread_barrier_wait (&barrier);
+  /* TLS load and access tests run here in the main thread.  */
+  xpthread_barrier_wait (&barrier);
+  return NULL;
+}
+
+static void *
+load_and_access (Lmid_t lmid, const char *mod, const char *func)
+{
+  /* Load module with TLS.  */
+  void *p = xdlmopen (lmid, mod, RTLD_NOW);
+  /* Access the TLS variable to ensure it is allocated.  */
+  void (*f) (void) = (void (*) (void))xdlsym (p, func);
+  f ();
+  return p;
+}
+
+static int
+do_test (void)
+{
+  void *mods[5];
+
+  {
+    int ret = pthread_barrier_init (&barrier, NULL, 2);
+    if (ret != 0)
+      {
+        errno = ret;
+        printf ("error: pthread_barrier_init: %m\n");
+        exit (1);
+      }
+  }
+
+  pthread_t blocked_thread = xpthread_create (NULL, blocked_thread_func, NULL);
+  xpthread_barrier_wait (&barrier);
+
+  printf ("maintls[%zu]:\t %p .. %p\n",
+	   sizeof maintls, maintls, maintls + sizeof maintls);
+  memset (maintls, 1, sizeof maintls);
+
+  /* Load modules with dynamic TLS (use surplus static TLS for libc
+     in new namespaces and may be for TLS optimizations too).  */
+  mods[0] = load_and_access (LM_ID_BASE, "tst-tls-ie-mod0.so", "access0");
+  mods[1] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod1.so", "access1");
+  mods[2] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod2.so", "access2");
+  mods[3] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod3.so", "access3");
+  /* Load modules with initial-exec TLS (can only use surplus static TLS).  */
+  mods[4] = load_and_access (LM_ID_BASE, "tst-tls-ie-mod6.so", "access6");
+
+  /* Here 576 bytes + 3 * libc use of surplus static TLS is in use so less
+     than 1024 bytes are available (exact number depends on TLS optimizations
+     and the libc TLS use).  */
+  printf ("The next dlmopen should fail...\n");
+  void *p = dlmopen (LM_ID_BASE, "tst-tls-ie-mod4.so", RTLD_NOW);
+  if (p != NULL)
+    {
+      printf ("error: expected dlmopen to fail because there is "
+	      "not enough surplus static TLS.\n");
+      exit (1);
+    }
+  printf ("...OK failed with: %s.\n", dlerror ());
+
+  xpthread_barrier_wait (&barrier);
+  xpthread_join (blocked_thread);
+
+  /* Close the modules.  */
+  for (int i = 0; i < 5; ++i)
+    xdlclose (mods[i]);
+
+  return 0;
+}
diff --git a/elf/tst-tls-ie-mod.h b/elf/tst-tls-ie-mod.h
new file mode 100644
index 0000000000..46b362a9b7
--- /dev/null
+++ b/elf/tst-tls-ie-mod.h
@@ -0,0 +1,40 @@
+/* Module with specified TLS size and model.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This file is parameterized by macros N, SIZE and MODEL.  */
+
+#include <stdio.h>
+#include <string.h>
+
+#define CONCATX(x, y) x ## y
+#define CONCAT(x, y) CONCATX (x, y)
+#define STRX(x) #x
+#define STR(x) STRX (x)
+
+#define VAR CONCAT (var, N)
+
+__attribute__ ((aligned (8), tls_model (MODEL)))
+__thread char VAR[SIZE];
+
+void
+CONCAT (access, N) (void)
+{
+  printf (STR (VAR) "[%d]:\t %p .. %p " MODEL "\n", SIZE, VAR, VAR + SIZE);
+  fflush (stdout);
+  memset (VAR, 1, SIZE);
+}
diff --git a/elf/tst-tls-ie-mod0.c b/elf/tst-tls-ie-mod0.c
new file mode 100644
index 0000000000..2450686e40
--- /dev/null
+++ b/elf/tst-tls-ie-mod0.c
@@ -0,0 +1,4 @@
+#define N 0
+#define SIZE 480
+#define MODEL "global-dynamic"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod1.c b/elf/tst-tls-ie-mod1.c
new file mode 100644
index 0000000000..849ff91e53
--- /dev/null
+++ b/elf/tst-tls-ie-mod1.c
@@ -0,0 +1,4 @@
+#define N 1
+#define SIZE 120
+#define MODEL "global-dynamic"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod2.c b/elf/tst-tls-ie-mod2.c
new file mode 100644
index 0000000000..23915ab67b
--- /dev/null
+++ b/elf/tst-tls-ie-mod2.c
@@ -0,0 +1,4 @@
+#define N 2
+#define SIZE 24
+#define MODEL "global-dynamic"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod3.c b/elf/tst-tls-ie-mod3.c
new file mode 100644
index 0000000000..5395f844a5
--- /dev/null
+++ b/elf/tst-tls-ie-mod3.c
@@ -0,0 +1,4 @@
+#define N 3
+#define SIZE 16
+#define MODEL "global-dynamic"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod4.c b/elf/tst-tls-ie-mod4.c
new file mode 100644
index 0000000000..93ac2eacae
--- /dev/null
+++ b/elf/tst-tls-ie-mod4.c
@@ -0,0 +1,4 @@
+#define N 4
+#define SIZE 1024
+#define MODEL "initial-exec"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod5.c b/elf/tst-tls-ie-mod5.c
new file mode 100644
index 0000000000..84b3fd285b
--- /dev/null
+++ b/elf/tst-tls-ie-mod5.c
@@ -0,0 +1,4 @@
+#define N 5
+#define SIZE 128
+#define MODEL "initial-exec"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod6.c b/elf/tst-tls-ie-mod6.c
new file mode 100644
index 0000000000..c736bf0684
--- /dev/null
+++ b/elf/tst-tls-ie-mod6.c
@@ -0,0 +1,4 @@
+#define N 6
+#define SIZE 576
+#define MODEL "initial-exec"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie.c b/elf/tst-tls-ie.c
new file mode 100644
index 0000000000..c06454c50c
--- /dev/null
+++ b/elf/tst-tls-ie.c
@@ -0,0 +1,113 @@
+/* Test dlopen of modules with initial-exec TLS.
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This test tries to check that surplus static TLS is not used up for
+   dynamic TLS optimizations and 3*192 + 4*144 = 1152 bytes of static
+   TLS is available for dlopening modules with initial-exec TLS.  It
+   depends on rtld.nns=4 and rtld.optional_static_tls=512 tunable setting.  */
+
+#include <errno.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+static int do_test (void);
+#include <support/xthread.h>
+#include <support/xdlfcn.h>
+#include <support/test-driver.c>
+
+/* Have some big TLS in the main exe: should not use surplus TLS.  */
+__thread char maintls[1000];
+
+static pthread_barrier_t barrier;
+
+/* Forces multi-threaded behaviour.  */
+static void *
+blocked_thread_func (void *closure)
+{
+  xpthread_barrier_wait (&barrier);
+  /* TLS load and access tests run here in the main thread.  */
+  xpthread_barrier_wait (&barrier);
+  return NULL;
+}
+
+static void *
+load_and_access (const char *mod, const char *func)
+{
+  /* Load module with TLS.  */
+  void *p = xdlopen (mod, RTLD_NOW);
+  /* Access the TLS variable to ensure it is allocated.  */
+  void (*f) (void) = (void (*) (void))xdlsym (p, func);
+  f ();
+  return p;
+}
+
+static int
+do_test (void)
+{
+  void *mods[6];
+
+  {
+    int ret = pthread_barrier_init (&barrier, NULL, 2);
+    if (ret != 0)
+      {
+        errno = ret;
+        printf ("error: pthread_barrier_init: %m\n");
+        exit (1);
+      }
+  }
+
+  pthread_t blocked_thread = xpthread_create (NULL, blocked_thread_func, NULL);
+  xpthread_barrier_wait (&barrier);
+
+  printf ("maintls[%zu]:\t %p .. %p\n",
+	   sizeof maintls, maintls, maintls + sizeof maintls);
+  memset (maintls, 1, sizeof maintls);
+
+  /* Load modules with dynamic TLS (may use surplus static TLS
+     opportunistically).  */
+  mods[0] = load_and_access ("tst-tls-ie-mod0.so", "access0");
+  mods[1] = load_and_access ("tst-tls-ie-mod1.so", "access1");
+  mods[2] = load_and_access ("tst-tls-ie-mod2.so", "access2");
+  mods[3] = load_and_access ("tst-tls-ie-mod3.so", "access3");
+  /* Load modules with initial-exec TLS (can only use surplus static TLS).  */
+  mods[4] = load_and_access ("tst-tls-ie-mod4.so", "access4");
+  mods[5] = load_and_access ("tst-tls-ie-mod5.so", "access5");
+
+  /* Here 1152 bytes of surplus static TLS is in use and at most 512 bytes
+     are available (depending on TLS optimizations).  */
+  printf ("The next dlopen should fail...\n");
+  void *p = dlopen ("tst-tls-ie-mod6.so", RTLD_NOW);
+  if (p != NULL)
+    {
+      printf ("error: expected dlopen to fail because there is "
+	      "not enough surplus static TLS.\n");
+      exit (1);
+    }
+  printf ("...OK failed with: %s.\n", dlerror ());
+
+  xpthread_barrier_wait (&barrier);
+  xpthread_join (blocked_thread);
+
+  /* Close the modules.  */
+  for (int i = 0; i < 6; ++i)
+    xdlclose (mods[i]);
+
+  return 0;
+}
diff --git a/manual/tunables.texi b/manual/tunables.texi
index 978e08f4fb..7f891c2710 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -247,6 +247,23 @@ e.g. LD_AUDIT, or will use more than 4 dynamic link namespaces as created
 by @code{dlmopen} with an lmid argument of @code{LM_ID_NEWLM}.
 @end deftp
 
+@deftp Tunable glibc.rtld.optional_static_tls
+Sets the amount of surplus static TLS in bytes to allocate at program
+startup.  Every thread created allocates this amount of specified surplus
+static TLS. This is a minimum value and additional space may be allocated
+for internal purposes including alignment.  Optional static TLS is used for
+optimizing dynamic TLS access for platforms that support such optimizations
+e.g. TLS descriptors or optimized TLS access for POWER (@code{DT_PPC64_OPT}
+and @code{DT_PPC_OPT}).  In order to make the best use of such optimizations
+the value should be as many bytes as would be required to hold all TLS
+variables in all dynamic loaded shared libraries.  The value cannot be known
+by the dynamic loader because it doesn't know the expected set of shared
+libraries which will be loaded.  The existing static TLS space cannot be
+changed once allocated at process startup.  The default allocation of
+optional static TLS is 512 bytes and is allocated in every thread.
+@end deftp
+
+
 @node Elision Tunables
 @section Elision Tunables
 @cindex elision tunables
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 3b0c6d9620..997084fb4b 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -442,6 +442,9 @@ struct rtld_global
   EXTERN size_t _dl_tls_static_used;
   /* Alignment requirement of the static TLS block.  */
   EXTERN size_t _dl_tls_static_align;
+  /* Remaining amount of static TLS that may be used for optimizing
+     dynamic TLS access (e.g. with TLSDESC).  */
+  EXTERN size_t _dl_tls_static_optional;
 
 /* Number of additional entries in the slotinfo array of each slotinfo
    list element.  A large number makes it almost certain take we never
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/2] Improve surplus TLS accounting
  2020-06-22 16:20 [PATCH v5 0/2] Improve surplus TLS accounting Szabolcs Nagy
  2020-06-22 16:21 ` [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces Szabolcs Nagy
  2020-06-22 16:21 ` [PATCH v5 2/2] rtld: Avoid using up static TLS surplus for optimizations [BZ #25051] Szabolcs Nagy
@ 2020-06-26 10:50 ` Szabolcs Nagy
  2 siblings, 0 replies; 12+ messages in thread
From: Szabolcs Nagy @ 2020-06-26 10:50 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1229 bytes --]

The 06/22/2020 17:20, Szabolcs Nagy wrote:
> Addressed the review comments, the tricky static TLS accounting
> bits are unchanged. These are still outstanding:
> 
> > Subsequent followup after committing this:
> > - We need to fix tst-manyaudit.
> > - We should be able to count how many spaces we need based on LD_AUDIT
> >   or DT_AUDIT and enable up to that amount.
> 
> Reran the tests on aarch64 and x86.
> 
> Szabolcs Nagy (2):
>   rtld: Add rtld.nns tunable for the number of supported namespaces
>   rtld: Avoid using up static TLS surplus for optimizations [BZ #25051]

since Carlos reviewed v4
https://sourceware.org/pipermail/libc-alpha/2020-June/115179.html
i attach the v4 to v5 diff in case that helps the review.

i assume the tst-auditmany fix would be something like

void
_dl_tls_static_surplus_init (size_t naudit)
{
  nns = TUNABLE_GET (nns, size_t, NULL);
  if (nns > DL_NNS)
    nns = DL_NNS; // nns = 1 when !SHARED

  if (DL_NNS - nns < naudit)
    _dl_fatal_printf ("too many auditors");
  nns += naudit;

  GLRO(dl_tls_static_surplus) = nns * X + ...;
}

default nns=4 and DL_NNS=16 allows 12 audit modules
(tst-auditmany needs 9) and auditors don't use up
the surplus tls reserved for the application.

[-- Attachment #2: surplus-v5.diff --]
[-- Type: text/x-diff, Size: 17128 bytes --]

diff --git a/csu/libc-start.c b/csu/libc-start.c
index 2396956266..4005caf84a 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -188,12 +188,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
   /* Initialize very early so that tunables can use it.  */
   __libc_init_secure ();
 
   __tunables_init (__environ);
 
-  _dl_static_tls_tunables_init ();
-
   ARCH_INIT_CPU_FEATURES ();
 
   /* Perform IREL{,A} relocations.  */
   ARCH_SETUP_IREL ();
 
diff --git a/csu/libc-tls.c b/csu/libc-tls.c
index 62f0b0c8c3..fb77cd94fa 100644
--- a/csu/libc-tls.c
+++ b/csu/libc-tls.c
@@ -127,10 +127,13 @@ __libc_setup_tls (void)
 	  if (phdr->p_align > max_align)
 	    max_align = phdr->p_align;
 	  break;
 	}
 
+  /* Calculate the size of the static TLS surplus.  */
+  _dl_tls_static_surplus_init ();
+
   /* We have to set up the TCB block which also (possibly) contains
      'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
      Instead we use 'sbrk' which would only uses 'errno' if it fails.
      In this case we are right away out of memory and the user gets
      what she/he deserves.  */
diff --git a/elf/Makefile b/elf/Makefile
index b8bde1f47d..5fadaec27c 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -203,11 +203,11 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-sonamemove-link tst-sonamemove-dlopen tst-dlopen-tlsmodid \
 	 tst-dlopen-self tst-auditmany tst-initfinilazyfail tst-dlopenfail \
 	 tst-dlopenfail-2 \
 	 tst-filterobj tst-filterobj-dlopen tst-auxobj tst-auxobj-dlopen \
 	 tst-audit14 tst-audit15 tst-audit16 \
-	 tst-tls-ie
+	 tst-tls-ie tst-tls-ie-dlmopen
 #	 reldep9
 tests-internal += loadtest unload unload2 circleload1 \
 	 neededtest neededtest2 neededtest3 neededtest4 \
 	 tst-tls3 tst-tls6 tst-tls7 tst-tls8 tst-dlmopen2 \
 	 tst-ptrguard1 tst-stackguard1 tst-libc_dlvsym \
@@ -318,11 +318,12 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
 		tst-dlopenfailmod1 tst-dlopenfaillinkmod tst-dlopenfailmod2 \
 		tst-dlopenfailmod3 tst-ldconfig-ld-mod \
 		tst-filterobj-flt tst-filterobj-aux tst-filterobj-filtee \
 		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3 \
 		tst-tls-ie-mod0 tst-tls-ie-mod1 tst-tls-ie-mod2 \
-		tst-tls-ie-mod3 tst-tls-ie-mod4 tst-tls-ie-mod5
+		tst-tls-ie-mod3 tst-tls-ie-mod4 tst-tls-ie-mod5 \
+		tst-tls-ie-mod6
 
 # Most modules build with _ISOMAC defined, but those filtered out
 # depend on internal headers.
 modules-names-tests = $(filter-out ifuncmod% tst-libc_dlvsym-dso tst-tlsmod%,\
 				   $(modules-names))
@@ -1758,6 +1759,17 @@ $(objpfx)tst-tls-ie.out: \
   $(objpfx)tst-tls-ie-mod0.so \
   $(objpfx)tst-tls-ie-mod1.so \
   $(objpfx)tst-tls-ie-mod2.so \
   $(objpfx)tst-tls-ie-mod3.so \
   $(objpfx)tst-tls-ie-mod4.so \
-  $(objpfx)tst-tls-ie-mod5.so
+  $(objpfx)tst-tls-ie-mod5.so \
+  $(objpfx)tst-tls-ie-mod6.so
+
+$(objpfx)tst-tls-ie-dlmopen: $(libdl) $(shared-thread-library)
+$(objpfx)tst-tls-ie-dlmopen.out: \
+  $(objpfx)tst-tls-ie-mod0.so \
+  $(objpfx)tst-tls-ie-mod1.so \
+  $(objpfx)tst-tls-ie-mod2.so \
+  $(objpfx)tst-tls-ie-mod3.so \
+  $(objpfx)tst-tls-ie-mod4.so \
+  $(objpfx)tst-tls-ie-mod5.so \
+  $(objpfx)tst-tls-ie-mod6.so
diff --git a/elf/dl-sysdep.c b/elf/dl-sysdep.c
index 68a780dcbd..854570821c 100644
--- a/elf/dl-sysdep.c
+++ b/elf/dl-sysdep.c
@@ -220,12 +220,10 @@ _dl_sysdep_start (void **start_argptr,
     }
 #endif
 
   __tunables_init (_environ);
 
-  _dl_static_tls_tunables_init ();
-
 #ifdef DL_SYSDEP_INIT
   DL_SYSDEP_INIT;
 #endif
 
 #ifdef DL_PLATFORM_INIT
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 740e33ea91..af5db12d08 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -27,11 +27,11 @@
 
 #include <tls.h>
 #include <dl-tls.h>
 #include <ldsodefs.h>
 
-#define TUNABLE_NAMESPACE dl
+#define TUNABLE_NAMESPACE rtld
 #include <dl-tunables.h>
 
 /* Surplus static TLS, GLRO(dl_tls_static_surplus), is used for
 
    - IE TLS in libc.so for all dlmopen namespaces except in the initial
@@ -43,25 +43,25 @@
    The maximum number of namespaces is DL_NNS, but to support that many
    namespaces correctly the static TLS allocation should be significantly
    increased, which may cause problems with small thread stacks due to the
    way static TLS is accounted (bug 11787).
 
-   So there is a dl.nns tunable limit on the number of supported namespaces
+   So there is a rtld.nns tunable limit on the number of supported namespaces
    that affects the size of the static TLS and by default it's small enough
    not to cause problems with existing applications. The limit is not
-   enforced or checked: it is the user's responsibility to increase dl.nns
+   enforced or checked: it is the user's responsibility to increase rtld.nns
    if more dlmopen namespaces are used.  */
 
 /* Size of initial-exec TLS in libc.so.  */
 #define LIBC_IE_TLS 192
 /* Size of initial-exec TLS in libraries other than libc.so.
    This should be large enough to cover runtime libraries of the
    compiler such as libgomp and libraries in libc other than libc.so.  */
 #define OTHER_IE_TLS 144
 
 void
-_dl_static_tls_tunables_init (void)
+_dl_tls_static_surplus_init (void)
 {
   size_t nns, opt_tls;
 
 #if HAVE_TUNABLES
   nns = TUNABLE_GET (nns, size_t, NULL);
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index 678f447e09..969e50327b 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -126,10 +126,6 @@ tunable_is_name (const char *orig, const char *envname)
   else
     return false;
 }
 
 #endif
-
-/* Initializers of tunables in the dl tunable namespace.  */
-void _dl_static_tls_tunables_init (void);
-
 #endif
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index ce46f28c7a..35634ef24d 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -125,18 +125,19 @@ glibc {
       type: INT_32
       default: 3
     }
   }
 
-  dl {
+  rtld {
     nns {
       type: SIZE_T
       minval: 1
       maxval: 16
       default: 4
     }
     optional_static_tls {
       type: SIZE_T
+      minval: 0
       default: 512
     }
   }
 }
diff --git a/elf/rtld.c b/elf/rtld.c
index f4c2602d65..f339f6894f 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -778,10 +778,13 @@ init_tls (void)
 	/* slotinfo[i].gen = 0; */
 	++i;
       }
   assert (i == GL(dl_tls_max_dtv_idx));
 
+  /* Calculate the size of the static TLS surplus.  */
+  _dl_tls_static_surplus_init ();
+
   /* Compute the TLS offsets for the various blocks.  */
   _dl_determine_tlsoffset ();
 
   /* Construct the static TLS block and the dtv for the initial
      thread.  For some platforms this will include allocating memory
diff --git a/elf/tst-tls-ie-dlmopen.c b/elf/tst-tls-ie-dlmopen.c
new file mode 100644
index 0000000000..0be47c7237
--- /dev/null
+++ b/elf/tst-tls-ie-dlmopen.c
@@ -0,0 +1,114 @@
+/* Test dlopen of modules with initial-exec TLS after dlmopen.
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This test tries to check that surplus static TLS is not used up for
+   dynamic TLS optimizations and 4*144 = 576 bytes of static TLS is
+   still available for dlopening modules with initial-exec TLS after 3
+   new dlmopen namespaces are created.  It depends on rtld.nns=4 and
+   rtld.optional_static_tls=512 tunable settings.  */
+
+#include <errno.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+static int do_test (void);
+#include <support/xthread.h>
+#include <support/xdlfcn.h>
+#include <support/test-driver.c>
+
+/* Have some big TLS in the main exe: should not use surplus TLS.  */
+__thread char maintls[1000];
+
+static pthread_barrier_t barrier;
+
+/* Forces multi-threaded behaviour.  */
+static void *
+blocked_thread_func (void *closure)
+{
+  xpthread_barrier_wait (&barrier);
+  /* TLS load and access tests run here in the main thread.  */
+  xpthread_barrier_wait (&barrier);
+  return NULL;
+}
+
+static void *
+load_and_access (Lmid_t lmid, const char *mod, const char *func)
+{
+  /* Load module with TLS.  */
+  void *p = xdlmopen (lmid, mod, RTLD_NOW);
+  /* Access the TLS variable to ensure it is allocated.  */
+  void (*f) (void) = (void (*) (void))xdlsym (p, func);
+  f ();
+  return p;
+}
+
+static int
+do_test (void)
+{
+  void *mods[5];
+
+  {
+    int ret = pthread_barrier_init (&barrier, NULL, 2);
+    if (ret != 0)
+      {
+        errno = ret;
+        printf ("error: pthread_barrier_init: %m\n");
+        exit (1);
+      }
+  }
+
+  pthread_t blocked_thread = xpthread_create (NULL, blocked_thread_func, NULL);
+  xpthread_barrier_wait (&barrier);
+
+  printf ("maintls[%zu]:\t %p .. %p\n",
+	   sizeof maintls, maintls, maintls + sizeof maintls);
+  memset (maintls, 1, sizeof maintls);
+
+  /* Load modules with dynamic TLS (use surplus static TLS for libc
+     in new namespaces and may be for TLS optimizations too).  */
+  mods[0] = load_and_access (LM_ID_BASE, "tst-tls-ie-mod0.so", "access0");
+  mods[1] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod1.so", "access1");
+  mods[2] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod2.so", "access2");
+  mods[3] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod3.so", "access3");
+  /* Load modules with initial-exec TLS (can only use surplus static TLS).  */
+  mods[4] = load_and_access (LM_ID_BASE, "tst-tls-ie-mod6.so", "access6");
+
+  /* Here 576 bytes + 3 * libc use of surplus static TLS is in use so less
+     than 1024 bytes are available (exact number depends on TLS optimizations
+     and the libc TLS use).  */
+  printf ("The next dlmopen should fail...\n");
+  void *p = dlmopen (LM_ID_BASE, "tst-tls-ie-mod4.so", RTLD_NOW);
+  if (p != NULL)
+    {
+      printf ("error: expected dlmopen to fail because there is "
+	      "not enough surplus static TLS.\n");
+      exit (1);
+    }
+  printf ("...OK failed with: %s.\n", dlerror ());
+
+  xpthread_barrier_wait (&barrier);
+  xpthread_join (blocked_thread);
+
+  /* Close the modules.  */
+  for (int i = 0; i < 5; ++i)
+    xdlclose (mods[i]);
+
+  return 0;
+}
diff --git a/elf/tst-tls-ie-mod6.c b/elf/tst-tls-ie-mod6.c
new file mode 100644
index 0000000000..c736bf0684
--- /dev/null
+++ b/elf/tst-tls-ie-mod6.c
@@ -0,0 +1,4 @@
+#define N 6
+#define SIZE 576
+#define MODEL "initial-exec"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie.c b/elf/tst-tls-ie.c
index 2f00a2936d..c06454c50c 100644
--- a/elf/tst-tls-ie.c
+++ b/elf/tst-tls-ie.c
@@ -17,11 +17,11 @@
    <https://www.gnu.org/licenses/>.  */
 
 /* This test tries to check that surplus static TLS is not used up for
    dynamic TLS optimizations and 3*192 + 4*144 = 1152 bytes of static
    TLS is available for dlopening modules with initial-exec TLS.  It
-   depends on dl.nns=4 and dl.optional_static_tls=512 tunable setting.  */
+   depends on rtld.nns=4 and rtld.optional_static_tls=512 tunable setting.  */
 
 #include <errno.h>
 #include <pthread.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -78,19 +78,32 @@ do_test (void)
 
   printf ("maintls[%zu]:\t %p .. %p\n",
 	   sizeof maintls, maintls, maintls + sizeof maintls);
   memset (maintls, 1, sizeof maintls);
 
-  /* Load modules with dynamic TLS (may use surplus TLS opportunistically).  */
+  /* Load modules with dynamic TLS (may use surplus static TLS
+     opportunistically).  */
   mods[0] = load_and_access ("tst-tls-ie-mod0.so", "access0");
   mods[1] = load_and_access ("tst-tls-ie-mod1.so", "access1");
   mods[2] = load_and_access ("tst-tls-ie-mod2.so", "access2");
   mods[3] = load_and_access ("tst-tls-ie-mod3.so", "access3");
-  /* Load modules with initial-exec TLS (can only use surplus TLS).  */
+  /* Load modules with initial-exec TLS (can only use surplus static TLS).  */
   mods[4] = load_and_access ("tst-tls-ie-mod4.so", "access4");
   mods[5] = load_and_access ("tst-tls-ie-mod5.so", "access5");
 
+  /* Here 1152 bytes of surplus static TLS is in use and at most 512 bytes
+     are available (depending on TLS optimizations).  */
+  printf ("The next dlopen should fail...\n");
+  void *p = dlopen ("tst-tls-ie-mod6.so", RTLD_NOW);
+  if (p != NULL)
+    {
+      printf ("error: expected dlopen to fail because there is "
+	      "not enough surplus static TLS.\n");
+      exit (1);
+    }
+  printf ("...OK failed with: %s.\n", dlerror ());
+
   xpthread_barrier_wait (&barrier);
   xpthread_join (blocked_thread);
 
   /* Close the modules.  */
   for (int i = 0; i < 6; ++i)
diff --git a/manual/tunables.texi b/manual/tunables.texi
index 437fdadff0..7f891c2710 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -228,33 +228,43 @@ passed to @code{malloc} for the largest bin size to enable.
 @end deftp
 
 @node Dynamic Linking Tunables
 @section Dynamic Linking Tunables
 @cindex dynamic linking tunables
-@cindex dl tunables
+@cindex rtld tunables
 
-@deftp {Tunable namespace} glibc.dl
+@deftp {Tunable namespace} glibc.rtld
 Dynamic linker behavior can be modified by setting the
-following tunables in the @code{dl} namespace:
+following tunables in the @code{rtld} namespace:
+@end deftp
+
+@deftp Tunable glibc.rtld.nns
+Sets the number of supported dynamic link namespaces (see @code{dlmopen}).
+Currently this limit can be set between 1 and 16 inclusive, the default is 4.
+Each link namespace consumes some memory in all thread, and thus raising the
+limit will increase the amount of memory each thread uses. Raising the limit
+is useful when your application uses more than 4 dynamic linker audit modules
+e.g. LD_AUDIT, or will use more than 4 dynamic link namespaces as created
+by @code{dlmopen} with an lmid argument of @code{LM_ID_NEWLM}.
+@end deftp
+
+@deftp Tunable glibc.rtld.optional_static_tls
+Sets the amount of surplus static TLS in bytes to allocate at program
+startup.  Every thread created allocates this amount of specified surplus
+static TLS. This is a minimum value and additional space may be allocated
+for internal purposes including alignment.  Optional static TLS is used for
+optimizing dynamic TLS access for platforms that support such optimizations
+e.g. TLS descriptors or optimized TLS access for POWER (@code{DT_PPC64_OPT}
+and @code{DT_PPC_OPT}).  In order to make the best use of such optimizations
+the value should be as many bytes as would be required to hold all TLS
+variables in all dynamic loaded shared libraries.  The value cannot be known
+by the dynamic loader because it doesn't know the expected set of shared
+libraries which will be loaded.  The existing static TLS space cannot be
+changed once allocated at process startup.  The default allocation of
+optional static TLS is 512 bytes and is allocated in every thread.
 @end deftp
 
-@deftp Tunable glibc.dl.nns
-Sets the number of supported dynamic link namespaces for which enough
-static TLS is allocated (see @code{dlmopen}).  If more namespaces are
-created then static TLS may run out at @code{dlopen} or @code{dlmopen}
-time which is a non-recoverable failure.  Currently this limit can be
-set between 1 and 16 inclusive, the default is 4. If the limit is
-increased then internally more static TLS is allocated to accomodate
-system libraries with initial-exec TLS in all namespaces.
-@end deftp
-
-@deftp Tunable glibc.dl.optional_static_tls
-Sets the amount of surplus static TLS that may be used for optimizing
-dynamic TLS access (only works on certain platforms, e.g. TLSDESC can
-be optimized this way). The internal allocation of static TLS is
-increased by this amount, the default is 512.
-@end deftp
 
 @node Elision Tunables
 @section Elision Tunables
 @cindex elision tunables
 @cindex tunables, elision
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index f631684583..997084fb4b 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1107,10 +1107,13 @@ extern size_t _dl_next_tls_modid (void) attribute_hidden;
 extern size_t _dl_count_modids (void) attribute_hidden;
 
 /* Calculate offset of the TLS blocks in the static TLS block.  */
 extern void _dl_determine_tlsoffset (void) attribute_hidden;
 
+/* Calculate the size of the static TLS surplus.  */
+void _dl_tls_static_surplus_init (void) attribute_hidden;
+
 #ifndef SHARED
 /* Set up the TCB for statically linked applications.  This is called
    early during startup because we always use TLS (for errno and the
    stack protector, among other things).  */
 void __libc_setup_tls (void);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces
  2020-06-22 16:21 ` [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces Szabolcs Nagy
@ 2020-07-06 13:13   ` Carlos O'Donell
  2020-07-06 17:32     ` Szabolcs Nagy
  0 siblings, 1 reply; 12+ messages in thread
From: Carlos O'Donell @ 2020-07-06 13:13 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha

On 6/22/20 12:21 PM, Szabolcs Nagy wrote:
> TLS_STATIC_SURPLUS is 1664 bytes currently which is not enough to
> support DL_NNS (== 16) number of dynamic link namespaces, if we
> assume 192 bytes of TLS are reserved for libc use and 144 bytes
> are reserved for other system libraries that use IE TLS.
> 
> A new tunable is introduced to control the number of supported
> namespaces and to adjust the surplus static TLS size as follows:
> 
> surplus_tls = 192 * (rtld.nns-1) + 144 * rtld.nns + 512
> 
> The default is rtld.nns == 4 and then the surplus TLS size is the
> same as before, so the behaviour is unchanged by default. If an
> application creates more namespaces than the rtld.nns setting
> allows, then it is not guaranteed to work, but the limit is not
> checked. So existing usage will continue to work, but in the
> future if an application creates more than 4 dynamic link
> namespaces then the tunable will need to be set.
> 
> In this patch DL_NNS is a fixed value and provides a maximum to
> the rtld.nns setting.
> 
> Static linking used fixed 2048 bytes surplus TLS, this is changed
> so the same contract is used as for dynamic linking.  With static
> linking DL_NNS == 1 so rtld.nns tunable is forced to 1, so by
> default the surplus TLS is reduced to 144 + 512 = 656 bytes. This
> change is not expected to cause problems.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.

OK for master.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
 
> ---
> 
> v5:
> - Split the patch into two: rtld.nns and tls optimization parts.
> - Use rtld.nns instead of dl.nns.
> - Renamed the init function and moved the calls close to
>   the first use of surplus tls size.
> - Updated the tunable documentation.
> v4:
> - Rebased and moved this log out of the commit message.
> - Minor commit message wording changes.
> v3:
> - archived at
>   https://sourceware.org/pipermail/libc-alpha/2020-March/111660.html
> - Replace TLS_STATIC_SURPLUS with GLRO(dl_tls_static_surplus) and
>   simplify related logic.
> - In case of static linking, replace GL(dl_tls_static_size) with
>   GLRO(dl_tls_static_surplus) in the code paths before the
>   GL(dl_tls_static_size) value is actually computed.
> - Update comments and the test code.
> - Document the new tunables.
> - Update description, mention static linking.
> v2:
> - Add dl.nns tunable.
> - Add dl.optional_static_tls tunable.
> - New surplus TLS usage contract that works reliably up to dl.nns
>   namespaces.
> ---
>  csu/libc-tls.c             | 28 +++++++++----------
>  elf/dl-tls.c               | 55 ++++++++++++++++++++++++++++++++++----
>  elf/dl-tunables.list       |  9 +++++++
>  elf/rtld.c                 |  3 +++
>  manual/tunables.texi       | 21 +++++++++++++++
>  sysdeps/generic/ldsodefs.h |  8 ++++++
>  6 files changed, 105 insertions(+), 19 deletions(-)
> 
> diff --git a/csu/libc-tls.c b/csu/libc-tls.c
> index 73ade0fec5..e2603157e8 100644
> --- a/csu/libc-tls.c
> +++ b/csu/libc-tls.c
> @@ -46,13 +46,16 @@ bool _dl_tls_dtv_gaps;
>  struct dtv_slotinfo_list *_dl_tls_dtv_slotinfo_list;
>  /* Number of modules in the static TLS block.  */
>  size_t _dl_tls_static_nelem;
> -/* Size of the static TLS block.  Giving this initialized value
> -   preallocates some surplus bytes in the static TLS area.  */
> -size_t _dl_tls_static_size = 2048;
> +/* Size of the static TLS block.  */
> +size_t _dl_tls_static_size;

OK.

>  /* Size actually allocated in the static TLS block.  */
>  size_t _dl_tls_static_used;
>  /* Alignment requirement of the static TLS block.  */
>  size_t _dl_tls_static_align;
> +/* Size of surplus space in the static TLS area for dynamically
> +   loaded modules with IE-model TLS or for TLSDESC optimization.
> +   See comments in elf/dl-tls.c where it is initialized.  */
> +size_t _dl_tls_static_surplus;

OK.

>  
>  /* Generation counter for the dtv.  */
>  size_t _dl_tls_generation;
> @@ -81,10 +84,8 @@ init_slotinfo (void)
>  static void
>  init_static_tls (size_t memsz, size_t align)
>  {
> -  /* That is the size of the TLS memory for this object.  The initialized
> -     value of _dl_tls_static_size is provided by dl-open.c to request some
> -     surplus that permits dynamic loading of modules with IE-model TLS.  */
> -  GL(dl_tls_static_size) = roundup (memsz + GL(dl_tls_static_size),
> +  /* That is the size of the TLS memory for this object.  */
> +  GL(dl_tls_static_size) = roundup (memsz + GLRO(dl_tls_static_surplus),

OK.

>  				    TLS_TCB_ALIGN);
>  #if TLS_TCB_AT_TP
>    GL(dl_tls_static_size) += TLS_TCB_SIZE;
> @@ -125,25 +126,24 @@ __libc_setup_tls (void)
>  	  break;
>  	}
>  
> +  /* Calculate the size of the static TLS surplus.  */
> +  _dl_tls_static_surplus_init ();

OK.

> +
>    /* We have to set up the TCB block which also (possibly) contains
>       'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
>       Instead we use 'sbrk' which would only uses 'errno' if it fails.
>       In this case we are right away out of memory and the user gets
> -     what she/he deserves.
> -
> -     The initialized value of _dl_tls_static_size is provided by dl-open.c
> -     to request some surplus that permits dynamic loading of modules with
> -     IE-model TLS.  */
> +     what she/he deserves.  */

OK.

>  #if TLS_TCB_AT_TP
>    /* Align the TCB offset to the maximum alignment, as
>       _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
>       and dl_tls_static_align.  */
> -  tcb_offset = roundup (memsz + GL(dl_tls_static_size), max_align);
> +  tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align);

OK.

>    tlsblock = __sbrk (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
>  #elif TLS_DTV_AT_TP
>    tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
>    tlsblock = __sbrk (tcb_offset + memsz + max_align
> -		     + TLS_PRE_TCB_SIZE + GL(dl_tls_static_size));
> +		     + TLS_PRE_TCB_SIZE + GLRO(dl_tls_static_surplus));

OK.

>    tlsblock += TLS_PRE_TCB_SIZE;
>  #else
>    /* In case a model with a different layout for the TCB and DTV
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index fa03234610..2201a1cc1d 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -29,10 +29,54 @@
>  #include <dl-tls.h>
>  #include <ldsodefs.h>
>  
> -/* Amount of excess space to allocate in the static TLS area
> -   to allow dynamic loading of modules defining IE-model TLS data.  */
> -#define TLS_STATIC_SURPLUS	64 + DL_NNS * 100
> +#define TUNABLE_NAMESPACE rtld
> +#include <dl-tunables.h>
> +
> +/* Surplus static TLS, GLRO(dl_tls_static_surplus), is used for
> +
> +   - IE TLS in libc.so for all dlmopen namespaces except in the initial
> +     one where libc.so is not loaded dynamically but at startup time,
> +   - IE TLS in other libraries which may be dynamically loaded even in the
> +     initial namespace,
> +   - and optionally for optimizing dynamic TLS access.
> +
> +   The maximum number of namespaces is DL_NNS, but to support that many
> +   namespaces correctly the static TLS allocation should be significantly
> +   increased, which may cause problems with small thread stacks due to the
> +   way static TLS is accounted (bug 11787).
> +
> +   So there is a rtld.nns tunable limit on the number of supported namespaces
> +   that affects the size of the static TLS and by default it's small enough
> +   not to cause problems with existing applications. The limit is not
> +   enforced or checked: it is the user's responsibility to increase rtld.nns
> +   if more dlmopen namespaces are used.  */

OK.

> +
> +/* Size of initial-exec TLS in libc.so.  */
> +#define LIBC_IE_TLS 192
> +/* Size of initial-exec TLS in libraries other than libc.so.
> +   This should be large enough to cover runtime libraries of the
> +   compiler such as libgomp and libraries in libc other than libc.so.  */
> +#define OTHER_IE_TLS 144
> +/* Size of additional surplus TLS, placeholder for TLS optimizations.  */
> +#define OPT_SURPLUS_TLS 512
>  
> +void
> +_dl_tls_static_surplus_init (void)
> +{
> +  size_t nns;
> +
> +#if HAVE_TUNABLES
> +  nns = TUNABLE_GET (nns, size_t, NULL);
> +#else
> +  /* Default values of the tunables.  */
> +  nns = 4;
> +#endif
> +  if (nns > DL_NNS)
> +    nns = DL_NNS;
> +  GLRO(dl_tls_static_surplus) = ((nns - 1) * LIBC_IE_TLS
> +				 + nns * OTHER_IE_TLS
> +				 + OPT_SURPLUS_TLS);
> +}

OK.

>  
>  /* Out-of-memory handler.  */
>  static void
> @@ -218,7 +262,8 @@ _dl_determine_tlsoffset (void)
>      }
>  
>    GL(dl_tls_static_used) = offset;
> -  GL(dl_tls_static_size) = (roundup (offset + TLS_STATIC_SURPLUS, max_align)
> +  GL(dl_tls_static_size) = (roundup (offset + GLRO(dl_tls_static_surplus),
> +				     max_align)
>  			    + TLS_TCB_SIZE);
>  #elif TLS_DTV_AT_TP
>    /* The TLS blocks start right after the TCB.  */
> @@ -262,7 +307,7 @@ _dl_determine_tlsoffset (void)
>      }
>  
>    GL(dl_tls_static_used) = offset;
> -  GL(dl_tls_static_size) = roundup (offset + TLS_STATIC_SURPLUS,
> +  GL(dl_tls_static_size) = roundup (offset + GLRO(dl_tls_static_surplus),
>  				    TLS_TCB_ALIGN);
>  #else
>  # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 0d398dd251..b07742d7b3 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -126,4 +126,13 @@ glibc {
>        default: 3
>      }
>    }
> +
> +  rtld {
> +    nns {
> +      type: SIZE_T
> +      minval: 1
> +      maxval: 16
> +      default: 4
> +    }

OK.

> +  }
>  }
> diff --git a/elf/rtld.c b/elf/rtld.c
> index f4c2602d65..f339f6894f 100644
> --- a/elf/rtld.c
> +++ b/elf/rtld.c
> @@ -780,6 +780,9 @@ init_tls (void)
>        }
>    assert (i == GL(dl_tls_max_dtv_idx));
>  
> +  /* Calculate the size of the static TLS surplus.  */
> +  _dl_tls_static_surplus_init ();
> +
>    /* Compute the TLS offsets for the various blocks.  */
>    _dl_determine_tlsoffset ();
>  
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index ec18b10834..978e08f4fb 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -31,6 +31,7 @@ their own namespace.
>  @menu
>  * Tunable names::  The structure of a tunable name
>  * Memory Allocation Tunables::  Tunables in the memory allocation subsystem
> +* Dynamic Linking Tunables:: Tunables in the dynamic linking subsystem
>  * Elision Tunables::  Tunables in elision subsystem
>  * POSIX Thread Tunables:: Tunables in the POSIX thread subsystem
>  * Hardware Capability Tunables::  Tunables that modify the hardware
> @@ -226,6 +227,26 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
>  passed to @code{malloc} for the largest bin size to enable.
>  @end deftp
>  
> +@node Dynamic Linking Tunables
> +@section Dynamic Linking Tunables
> +@cindex dynamic linking tunables
> +@cindex rtld tunables
> +
> +@deftp {Tunable namespace} glibc.rtld
> +Dynamic linker behavior can be modified by setting the
> +following tunables in the @code{rtld} namespace:
> +@end deftp
> +
> +@deftp Tunable glibc.rtld.nns
> +Sets the number of supported dynamic link namespaces (see @code{dlmopen}).
> +Currently this limit can be set between 1 and 16 inclusive, the default is 4.
> +Each link namespace consumes some memory in all thread, and thus raising the
> +limit will increase the amount of memory each thread uses. Raising the limit
> +is useful when your application uses more than 4 dynamic linker audit modules
> +e.g. LD_AUDIT, or will use more than 4 dynamic link namespaces as created
> +by @code{dlmopen} with an lmid argument of @code{LM_ID_NEWLM}.
> +@end deftp

OK.

> +
>  @node Elision Tunables
>  @section Elision Tunables
>  @cindex elision tunables
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index c525ffa12c..3b0c6d9620 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -583,6 +583,11 @@ struct rtld_global_ro
>       binaries, don't honor for PIEs).  */
>    EXTERN ElfW(Addr) _dl_use_load_bias;
>  
> +  /* Size of surplus space in the static TLS area for dynamically
> +     loaded modules with IE-model TLS or for TLSDESC optimization.
> +     See comments in elf/dl-tls.c where it is initialized.  */
> +  EXTERN size_t _dl_tls_static_surplus;
> +
>    /* Name of the shared object to be profiled (if any).  */
>    EXTERN const char *_dl_profile;
>    /* Filename of the output file.  */
> @@ -1101,6 +1106,9 @@ extern size_t _dl_count_modids (void) attribute_hidden;
>  /* Calculate offset of the TLS blocks in the static TLS block.  */
>  extern void _dl_determine_tlsoffset (void) attribute_hidden;
>  
> +/* Calculate the size of the static TLS surplus.  */
> +void _dl_tls_static_surplus_init (void) attribute_hidden;
> +
>  #ifndef SHARED
>  /* Set up the TCB for statically linked applications.  This is called
>     early during startup because we always use TLS (for errno and the
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/2] rtld: Avoid using up static TLS surplus for optimizations [BZ #25051]
  2020-06-22 16:21 ` [PATCH v5 2/2] rtld: Avoid using up static TLS surplus for optimizations [BZ #25051] Szabolcs Nagy
@ 2020-07-06 13:19   ` Carlos O'Donell
  0 siblings, 0 replies; 12+ messages in thread
From: Carlos O'Donell @ 2020-07-06 13:19 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha

On 6/22/20 12:21 PM, Szabolcs Nagy wrote:
> On some targets static TLS surplus area can be used opportunistically
> for dynamically loaded modules such that the TLS access then becomes
> faster (TLSDESC and powerpc TLS optimization). However we don't want
> all surplus TLS to be used for this optimization because dynamically
> loaded modules with initial-exec model TLS can only use surplus TLS.
> 
> The new contract for surplus static TLS use is:
> 
> - libc.so can have up to 192 bytes of IE TLS,
> - other system libraries together can have up to 144 bytes of IE TLS.
> - Some "optional" static TLS is available for opportunistic use.
> 
> The optional TLS is now tunable: rtld.optional_static_tls, so users
> can directly affect the allocated static TLS size. (Note that module
> unloading with dlclose does not reclaim static TLS. After the optional
> TLS runs out, TLS access is no longer optimized to use static TLS.)
> 
> The default setting of rtld.optional_static_tls is 512 so the surplus
> TLS is 3*192 + 4*144 + 512 = 1664 by default, the same as before.
> 
> Fixes BZ #25051.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.

OK for mater.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

 
> ---
> 
> v5:
> - Split the patch into two: rtld.nns and tls optimization parts.
> - Use rtld tunable namespace.
> - Set tunable min val to 0.
> - New test using dlmopen.
> - Tests now have check for failing IE TLS dlopen.
> - Updated the tunable documentation.
> v4:
> - Rebased and moved this log out of the commit message.
> - Minor commit message wording changes.
> v3:
> - archived at
>   https://sourceware.org/pipermail/libc-alpha/2020-March/111660.html
> - Replace TLS_STATIC_SURPLUS with GLRO(dl_tls_static_surplus) and
>   simplify related logic.
> - In case of static linking, replace GL(dl_tls_static_size) with
>   GLRO(dl_tls_static_surplus) in the code paths before the
>   GL(dl_tls_static_size) value is actually computed.
> - Update comments and the test code.
> - Document the new tunables.
> - Update description, mention static linking.
> v2:
> - Add dl.nns tunable.
> - Add dl.optional_static_tls tunable.
> - New surplus TLS usage contract that works reliably up to dl.nns
>   namespaces.
> ---
>  csu/libc-tls.c             |   3 +
>  elf/Makefile               |  29 +++++++++-
>  elf/dl-reloc.c             |  37 +++++++++---
>  elf/dl-tls.c               |   9 +--
>  elf/dl-tunables.list       |   5 ++
>  elf/dynamic-link.h         |   5 +-
>  elf/tst-tls-ie-dlmopen.c   | 114 +++++++++++++++++++++++++++++++++++++
>  elf/tst-tls-ie-mod.h       |  40 +++++++++++++
>  elf/tst-tls-ie-mod0.c      |   4 ++
>  elf/tst-tls-ie-mod1.c      |   4 ++
>  elf/tst-tls-ie-mod2.c      |   4 ++
>  elf/tst-tls-ie-mod3.c      |   4 ++
>  elf/tst-tls-ie-mod4.c      |   4 ++
>  elf/tst-tls-ie-mod5.c      |   4 ++
>  elf/tst-tls-ie-mod6.c      |   4 ++
>  elf/tst-tls-ie.c           | 113 ++++++++++++++++++++++++++++++++++++
>  manual/tunables.texi       |  17 ++++++
>  sysdeps/generic/ldsodefs.h |   3 +
>  18 files changed, 386 insertions(+), 17 deletions(-)
>  create mode 100644 elf/tst-tls-ie-dlmopen.c
>  create mode 100644 elf/tst-tls-ie-mod.h
>  create mode 100644 elf/tst-tls-ie-mod0.c
>  create mode 100644 elf/tst-tls-ie-mod1.c
>  create mode 100644 elf/tst-tls-ie-mod2.c
>  create mode 100644 elf/tst-tls-ie-mod3.c
>  create mode 100644 elf/tst-tls-ie-mod4.c
>  create mode 100644 elf/tst-tls-ie-mod5.c
>  create mode 100644 elf/tst-tls-ie-mod6.c
>  create mode 100644 elf/tst-tls-ie.c
> 
> diff --git a/csu/libc-tls.c b/csu/libc-tls.c
> index e2603157e8..fb77cd94fa 100644
> --- a/csu/libc-tls.c
> +++ b/csu/libc-tls.c
> @@ -56,6 +56,9 @@ size_t _dl_tls_static_align;
>     loaded modules with IE-model TLS or for TLSDESC optimization.
>     See comments in elf/dl-tls.c where it is initialized.  */
>  size_t _dl_tls_static_surplus;
> +/* Remaining amount of static TLS that may be used for optimizing
> +   dynamic TLS access (e.g. with TLSDESC).  */
> +size_t _dl_tls_static_optional;

OK.

>  
>  /* Generation counter for the dtv.  */
>  size_t _dl_tls_generation;
> diff --git a/elf/Makefile b/elf/Makefile
> index 6fe1df90bb..5fadaec27c 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -204,7 +204,8 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
>  	 tst-dlopen-self tst-auditmany tst-initfinilazyfail tst-dlopenfail \
>  	 tst-dlopenfail-2 \
>  	 tst-filterobj tst-filterobj-dlopen tst-auxobj tst-auxobj-dlopen \
> -	 tst-audit14 tst-audit15 tst-audit16
> +	 tst-audit14 tst-audit15 tst-audit16 \
> +	 tst-tls-ie tst-tls-ie-dlmopen

OK.

>  #	 reldep9
>  tests-internal += loadtest unload unload2 circleload1 \
>  	 neededtest neededtest2 neededtest3 neededtest4 \
> @@ -317,7 +318,11 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
>  		tst-dlopenfailmod1 tst-dlopenfaillinkmod tst-dlopenfailmod2 \
>  		tst-dlopenfailmod3 tst-ldconfig-ld-mod \
>  		tst-filterobj-flt tst-filterobj-aux tst-filterobj-filtee \
> -		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3
> +		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3 \
> +		tst-tls-ie-mod0 tst-tls-ie-mod1 tst-tls-ie-mod2 \
> +		tst-tls-ie-mod3 tst-tls-ie-mod4 tst-tls-ie-mod5 \
> +		tst-tls-ie-mod6

OK.

> +
>  # Most modules build with _ISOMAC defined, but those filtered out
>  # depend on internal headers.
>  modules-names-tests = $(filter-out ifuncmod% tst-libc_dlvsym-dso tst-tlsmod%,\
> @@ -1748,3 +1753,23 @@ $(objpfx)tst-auxobj: $(objpfx)tst-filterobj-aux.so
>  $(objpfx)tst-auxobj-dlopen: $(libdl)
>  $(objpfx)tst-auxobj.out: $(objpfx)tst-filterobj-filtee.so
>  $(objpfx)tst-auxobj-dlopen.out: $(objpfx)tst-filterobj-filtee.so
> +
> +$(objpfx)tst-tls-ie: $(libdl) $(shared-thread-library)
> +$(objpfx)tst-tls-ie.out: \
> +  $(objpfx)tst-tls-ie-mod0.so \
> +  $(objpfx)tst-tls-ie-mod1.so \
> +  $(objpfx)tst-tls-ie-mod2.so \
> +  $(objpfx)tst-tls-ie-mod3.so \
> +  $(objpfx)tst-tls-ie-mod4.so \
> +  $(objpfx)tst-tls-ie-mod5.so \
> +  $(objpfx)tst-tls-ie-mod6.so

OK.

> +
> +$(objpfx)tst-tls-ie-dlmopen: $(libdl) $(shared-thread-library)
> +$(objpfx)tst-tls-ie-dlmopen.out: \
> +  $(objpfx)tst-tls-ie-mod0.so \
> +  $(objpfx)tst-tls-ie-mod1.so \
> +  $(objpfx)tst-tls-ie-mod2.so \
> +  $(objpfx)tst-tls-ie-mod3.so \
> +  $(objpfx)tst-tls-ie-mod4.so \
> +  $(objpfx)tst-tls-ie-mod5.so \
> +  $(objpfx)tst-tls-ie-mod6.so

OK.

> diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
> index ffcc84d396..6d32e49467 100644
> --- a/elf/dl-reloc.c
> +++ b/elf/dl-reloc.c
> @@ -39,13 +39,16 @@
>  /* We are trying to perform a static TLS relocation in MAP, but it was
>     dynamically loaded.  This can only work if there is enough surplus in
>     the static TLS area already allocated for each running thread.  If this
> -   object's TLS segment is too big to fit, we fail.  If it fits,
> -   we set MAP->l_tls_offset and return.
> -   This function intentionally does not return any value but signals error
> -   directly, as static TLS should be rare and code handling it should
> -   not be inlined as much as possible.  */
> +   object's TLS segment is too big to fit, we fail with -1.  If it fits,
> +   we set MAP->l_tls_offset and return 0.
> +   A portion of the surplus static TLS can be optionally used to optimize
> +   dynamic TLS access (with TLSDESC or powerpc TLS optimizations).
> +   If OPTIONAL is true then TLS is allocated for such optimization and
> +   the caller must have a fallback in case the optional portion of surplus
> +   TLS runs out.  If OPTIONAL is false then the entire surplus TLS area is
> +   considered and the allocation only fails if that runs out.  */
>  int
> -_dl_try_allocate_static_tls (struct link_map *map)
> +_dl_try_allocate_static_tls (struct link_map *map, bool optional)

OK.

>  {
>    /* If we've already used the variable with dynamic access, or if the
>       alignment requirements are too high, fail.  */
> @@ -68,8 +71,14 @@ _dl_try_allocate_static_tls (struct link_map *map)
>  
>    size_t n = (freebytes - blsize) / map->l_tls_align;
>  
> -  size_t offset = GL(dl_tls_static_used) + (freebytes - n * map->l_tls_align
> -					    - map->l_tls_firstbyte_offset);
> +  /* Account optional static TLS surplus usage.  */
> +  size_t use = freebytes - n * map->l_tls_align - map->l_tls_firstbyte_offset;
> +  if (optional && use > GL(dl_tls_static_optional))
> +    goto fail;
> +  else if (optional)
> +    GL(dl_tls_static_optional) -= use;
> +
> +  size_t offset = GL(dl_tls_static_used) + use;
>  
>    map->l_tls_offset = GL(dl_tls_static_used) = offset;
>  #elif TLS_DTV_AT_TP
> @@ -83,6 +92,13 @@ _dl_try_allocate_static_tls (struct link_map *map)
>    if (used > GL(dl_tls_static_size))
>      goto fail;
>  
> +  /* Account optional static TLS surplus usage.  */
> +  size_t use = used - GL(dl_tls_static_used);
> +  if (optional && use > GL(dl_tls_static_optional))
> +    goto fail;
> +  else if (optional)
> +    GL(dl_tls_static_optional) -= use;
> +
>    map->l_tls_offset = offset;
>    map->l_tls_firstbyte_offset = GL(dl_tls_static_used);
>    GL(dl_tls_static_used) = used;
> @@ -110,12 +126,15 @@ _dl_try_allocate_static_tls (struct link_map *map)
>    return 0;
>  }
>  
> +/* This function intentionally does not return any value but signals error
> +   directly, as static TLS should be rare and code handling it should
> +   not be inlined as much as possible.  */
>  void
>  __attribute_noinline__
>  _dl_allocate_static_tls (struct link_map *map)
>  {
>    if (map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET
> -      || _dl_try_allocate_static_tls (map))
> +      || _dl_try_allocate_static_tls (map, false))
>      {
>        _dl_signal_error (0, map->l_name, NULL, N_("\
>  cannot allocate memory in static TLS block"));
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 2201a1cc1d..af5db12d08 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -57,25 +57,26 @@
>     This should be large enough to cover runtime libraries of the
>     compiler such as libgomp and libraries in libc other than libc.so.  */
>  #define OTHER_IE_TLS 144
> -/* Size of additional surplus TLS, placeholder for TLS optimizations.  */
> -#define OPT_SURPLUS_TLS 512
>  
>  void
>  _dl_tls_static_surplus_init (void)
>  {
> -  size_t nns;
> +  size_t nns, opt_tls;
>  
>  #if HAVE_TUNABLES
>    nns = TUNABLE_GET (nns, size_t, NULL);
> +  opt_tls = TUNABLE_GET (optional_static_tls, size_t, NULL);
>  #else
>    /* Default values of the tunables.  */
>    nns = 4;
> +  opt_tls = 512;
>  #endif
>    if (nns > DL_NNS)
>      nns = DL_NNS;
> +  GL(dl_tls_static_optional) = opt_tls;
>    GLRO(dl_tls_static_surplus) = ((nns - 1) * LIBC_IE_TLS
>  				 + nns * OTHER_IE_TLS
> -				 + OPT_SURPLUS_TLS);
> +				 + opt_tls);

OK.

>  }
>  
>  /* Out-of-memory handler.  */
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index b07742d7b3..35634ef24d 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -134,5 +134,10 @@ glibc {
>        maxval: 16
>        default: 4
>      }
> +    optional_static_tls {
> +      type: SIZE_T
> +      minval: 0
> +      default: 512

OK.

> +    }
>    }
>  }
> diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
> index bb7a66f4cd..6727233e1a 100644
> --- a/elf/dynamic-link.h
> +++ b/elf/dynamic-link.h
> @@ -40,9 +40,10 @@
>      (__builtin_expect ((sym_map)->l_tls_offset				\
>  		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
>       && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
> -	 || _dl_try_allocate_static_tls (sym_map) == 0))
> +	 || _dl_try_allocate_static_tls (sym_map, true) == 0))
>  
> -int _dl_try_allocate_static_tls (struct link_map *map) attribute_hidden;
> +int _dl_try_allocate_static_tls (struct link_map *map, bool optional)
> +  attribute_hidden;
>  
>  #include <elf.h>
>  
> diff --git a/elf/tst-tls-ie-dlmopen.c b/elf/tst-tls-ie-dlmopen.c
> new file mode 100644
> index 0000000000..0be47c7237
> --- /dev/null
> +++ b/elf/tst-tls-ie-dlmopen.c
> @@ -0,0 +1,114 @@
> +/* Test dlopen of modules with initial-exec TLS after dlmopen.
> +   Copyright (C) 2016-2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* This test tries to check that surplus static TLS is not used up for
> +   dynamic TLS optimizations and 4*144 = 576 bytes of static TLS is
> +   still available for dlopening modules with initial-exec TLS after 3
> +   new dlmopen namespaces are created.  It depends on rtld.nns=4 and
> +   rtld.optional_static_tls=512 tunable settings.  */

OK. Good comment.

> +
> +#include <errno.h>
> +#include <pthread.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +static int do_test (void);
> +#include <support/xthread.h>
> +#include <support/xdlfcn.h>
> +#include <support/test-driver.c>
> +
> +/* Have some big TLS in the main exe: should not use surplus TLS.  */
> +__thread char maintls[1000];
> +
> +static pthread_barrier_t barrier;
> +
> +/* Forces multi-threaded behaviour.  */
> +static void *
> +blocked_thread_func (void *closure)
> +{
> +  xpthread_barrier_wait (&barrier);
> +  /* TLS load and access tests run here in the main thread.  */
> +  xpthread_barrier_wait (&barrier);
> +  return NULL;
> +}
> +
> +static void *
> +load_and_access (Lmid_t lmid, const char *mod, const char *func)
> +{
> +  /* Load module with TLS.  */
> +  void *p = xdlmopen (lmid, mod, RTLD_NOW);
> +  /* Access the TLS variable to ensure it is allocated.  */
> +  void (*f) (void) = (void (*) (void))xdlsym (p, func);
> +  f ();
> +  return p;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  void *mods[5];
> +
> +  {
> +    int ret = pthread_barrier_init (&barrier, NULL, 2);
> +    if (ret != 0)
> +      {
> +        errno = ret;
> +        printf ("error: pthread_barrier_init: %m\n");
> +        exit (1);
> +      }
> +  }
> +
> +  pthread_t blocked_thread = xpthread_create (NULL, blocked_thread_func, NULL);
> +  xpthread_barrier_wait (&barrier);
> +
> +  printf ("maintls[%zu]:\t %p .. %p\n",
> +	   sizeof maintls, maintls, maintls + sizeof maintls);
> +  memset (maintls, 1, sizeof maintls);
> +
> +  /* Load modules with dynamic TLS (use surplus static TLS for libc
> +     in new namespaces and may be for TLS optimizations too).  */
> +  mods[0] = load_and_access (LM_ID_BASE, "tst-tls-ie-mod0.so", "access0");
> +  mods[1] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod1.so", "access1");
> +  mods[2] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod2.so", "access2");
> +  mods[3] = load_and_access (LM_ID_NEWLM, "tst-tls-ie-mod3.so", "access3");
> +  /* Load modules with initial-exec TLS (can only use surplus static TLS).  */
> +  mods[4] = load_and_access (LM_ID_BASE, "tst-tls-ie-mod6.so", "access6");
> +
> +  /* Here 576 bytes + 3 * libc use of surplus static TLS is in use so less
> +     than 1024 bytes are available (exact number depends on TLS optimizations
> +     and the libc TLS use).  */
> +  printf ("The next dlmopen should fail...\n");
> +  void *p = dlmopen (LM_ID_BASE, "tst-tls-ie-mod4.so", RTLD_NOW);
> +  if (p != NULL)
> +    {
> +      printf ("error: expected dlmopen to fail because there is "
> +	      "not enough surplus static TLS.\n");
> +      exit (1);

OK. You could have used FAIL_EXIT1 to merge both.

> +    }
> +  printf ("...OK failed with: %s.\n", dlerror ());
> +
> +  xpthread_barrier_wait (&barrier);
> +  xpthread_join (blocked_thread);
> +
> +  /* Close the modules.  */
> +  for (int i = 0; i < 5; ++i)
> +    xdlclose (mods[i]);
> +
> +  return 0;
> +}
> diff --git a/elf/tst-tls-ie-mod.h b/elf/tst-tls-ie-mod.h
> new file mode 100644
> index 0000000000..46b362a9b7
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod.h
> @@ -0,0 +1,40 @@
> +/* Module with specified TLS size and model.
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* This file is parameterized by macros N, SIZE and MODEL.  */
> +
> +#include <stdio.h>
> +#include <string.h>
> +
> +#define CONCATX(x, y) x ## y
> +#define CONCAT(x, y) CONCATX (x, y)
> +#define STRX(x) #x
> +#define STR(x) STRX (x)
> +
> +#define VAR CONCAT (var, N)
> +
> +__attribute__ ((aligned (8), tls_model (MODEL)))
> +__thread char VAR[SIZE];
> +
> +void
> +CONCAT (access, N) (void)
> +{
> +  printf (STR (VAR) "[%d]:\t %p .. %p " MODEL "\n", SIZE, VAR, VAR + SIZE);
> +  fflush (stdout);
> +  memset (VAR, 1, SIZE);
> +}
> diff --git a/elf/tst-tls-ie-mod0.c b/elf/tst-tls-ie-mod0.c
> new file mode 100644
> index 0000000000..2450686e40
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod0.c
> @@ -0,0 +1,4 @@
> +#define N 0
> +#define SIZE 480
> +#define MODEL "global-dynamic"
> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie-mod1.c b/elf/tst-tls-ie-mod1.c
> new file mode 100644
> index 0000000000..849ff91e53
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod1.c
> @@ -0,0 +1,4 @@
> +#define N 1
> +#define SIZE 120
> +#define MODEL "global-dynamic"
> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie-mod2.c b/elf/tst-tls-ie-mod2.c
> new file mode 100644
> index 0000000000..23915ab67b
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod2.c
> @@ -0,0 +1,4 @@
> +#define N 2
> +#define SIZE 24
> +#define MODEL "global-dynamic"
> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie-mod3.c b/elf/tst-tls-ie-mod3.c
> new file mode 100644
> index 0000000000..5395f844a5
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod3.c
> @@ -0,0 +1,4 @@
> +#define N 3
> +#define SIZE 16
> +#define MODEL "global-dynamic"
> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie-mod4.c b/elf/tst-tls-ie-mod4.c
> new file mode 100644
> index 0000000000..93ac2eacae
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod4.c
> @@ -0,0 +1,4 @@
> +#define N 4
> +#define SIZE 1024
> +#define MODEL "initial-exec"
> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie-mod5.c b/elf/tst-tls-ie-mod5.c
> new file mode 100644
> index 0000000000..84b3fd285b
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod5.c
> @@ -0,0 +1,4 @@
> +#define N 5
> +#define SIZE 128
> +#define MODEL "initial-exec"
> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie-mod6.c b/elf/tst-tls-ie-mod6.c
> new file mode 100644
> index 0000000000..c736bf0684
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod6.c
> @@ -0,0 +1,4 @@
> +#define N 6
> +#define SIZE 576
> +#define MODEL "initial-exec"
> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie.c b/elf/tst-tls-ie.c
> new file mode 100644
> index 0000000000..c06454c50c
> --- /dev/null
> +++ b/elf/tst-tls-ie.c
> @@ -0,0 +1,113 @@
> +/* Test dlopen of modules with initial-exec TLS.
> +   Copyright (C) 2016-2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* This test tries to check that surplus static TLS is not used up for
> +   dynamic TLS optimizations and 3*192 + 4*144 = 1152 bytes of static
> +   TLS is available for dlopening modules with initial-exec TLS.  It
> +   depends on rtld.nns=4 and rtld.optional_static_tls=512 tunable setting.  */
> +
> +#include <errno.h>
> +#include <pthread.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +static int do_test (void);
> +#include <support/xthread.h>
> +#include <support/xdlfcn.h>
> +#include <support/test-driver.c>
> +
> +/* Have some big TLS in the main exe: should not use surplus TLS.  */
> +__thread char maintls[1000];
> +
> +static pthread_barrier_t barrier;
> +
> +/* Forces multi-threaded behaviour.  */
> +static void *
> +blocked_thread_func (void *closure)
> +{
> +  xpthread_barrier_wait (&barrier);
> +  /* TLS load and access tests run here in the main thread.  */
> +  xpthread_barrier_wait (&barrier);
> +  return NULL;
> +}
> +
> +static void *
> +load_and_access (const char *mod, const char *func)
> +{
> +  /* Load module with TLS.  */
> +  void *p = xdlopen (mod, RTLD_NOW);
> +  /* Access the TLS variable to ensure it is allocated.  */
> +  void (*f) (void) = (void (*) (void))xdlsym (p, func);
> +  f ();
> +  return p;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  void *mods[6];
> +
> +  {
> +    int ret = pthread_barrier_init (&barrier, NULL, 2);
> +    if (ret != 0)
> +      {
> +        errno = ret;
> +        printf ("error: pthread_barrier_init: %m\n");
> +        exit (1);
> +      }
> +  }
> +
> +  pthread_t blocked_thread = xpthread_create (NULL, blocked_thread_func, NULL);
> +  xpthread_barrier_wait (&barrier);
> +
> +  printf ("maintls[%zu]:\t %p .. %p\n",
> +	   sizeof maintls, maintls, maintls + sizeof maintls);
> +  memset (maintls, 1, sizeof maintls);
> +
> +  /* Load modules with dynamic TLS (may use surplus static TLS
> +     opportunistically).  */
> +  mods[0] = load_and_access ("tst-tls-ie-mod0.so", "access0");
> +  mods[1] = load_and_access ("tst-tls-ie-mod1.so", "access1");
> +  mods[2] = load_and_access ("tst-tls-ie-mod2.so", "access2");
> +  mods[3] = load_and_access ("tst-tls-ie-mod3.so", "access3");
> +  /* Load modules with initial-exec TLS (can only use surplus static TLS).  */
> +  mods[4] = load_and_access ("tst-tls-ie-mod4.so", "access4");
> +  mods[5] = load_and_access ("tst-tls-ie-mod5.so", "access5");
> +
> +  /* Here 1152 bytes of surplus static TLS is in use and at most 512 bytes
> +     are available (depending on TLS optimizations).  */
> +  printf ("The next dlopen should fail...\n");
> +  void *p = dlopen ("tst-tls-ie-mod6.so", RTLD_NOW);
> +  if (p != NULL)
> +    {
> +      printf ("error: expected dlopen to fail because there is "
> +	      "not enough surplus static TLS.\n");
> +      exit (1);

OK.

> +    }
> +  printf ("...OK failed with: %s.\n", dlerror ());
> +
> +  xpthread_barrier_wait (&barrier);
> +  xpthread_join (blocked_thread);
> +
> +  /* Close the modules.  */
> +  for (int i = 0; i < 6; ++i)
> +    xdlclose (mods[i]);
> +
> +  return 0;
> +}
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index 978e08f4fb..7f891c2710 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -247,6 +247,23 @@ e.g. LD_AUDIT, or will use more than 4 dynamic link namespaces as created
>  by @code{dlmopen} with an lmid argument of @code{LM_ID_NEWLM}.
>  @end deftp
>  
> +@deftp Tunable glibc.rtld.optional_static_tls
> +Sets the amount of surplus static TLS in bytes to allocate at program
> +startup.  Every thread created allocates this amount of specified surplus
> +static TLS. This is a minimum value and additional space may be allocated
> +for internal purposes including alignment.  Optional static TLS is used for
> +optimizing dynamic TLS access for platforms that support such optimizations
> +e.g. TLS descriptors or optimized TLS access for POWER (@code{DT_PPC64_OPT}
> +and @code{DT_PPC_OPT}).  In order to make the best use of such optimizations
> +the value should be as many bytes as would be required to hold all TLS
> +variables in all dynamic loaded shared libraries.  The value cannot be known
> +by the dynamic loader because it doesn't know the expected set of shared
> +libraries which will be loaded.  The existing static TLS space cannot be
> +changed once allocated at process startup.  The default allocation of
> +optional static TLS is 512 bytes and is allocated in every thread.
> +@end deftp
> +
> +
>  @node Elision Tunables
>  @section Elision Tunables
>  @cindex elision tunables
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index 3b0c6d9620..997084fb4b 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -442,6 +442,9 @@ struct rtld_global
>    EXTERN size_t _dl_tls_static_used;
>    /* Alignment requirement of the static TLS block.  */
>    EXTERN size_t _dl_tls_static_align;
> +  /* Remaining amount of static TLS that may be used for optimizing
> +     dynamic TLS access (e.g. with TLSDESC).  */
> +  EXTERN size_t _dl_tls_static_optional;
>  
>  /* Number of additional entries in the slotinfo array of each slotinfo
>     list element.  A large number makes it almost certain take we never
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces
  2020-07-06 13:13   ` Carlos O'Donell
@ 2020-07-06 17:32     ` Szabolcs Nagy
  2020-07-06 18:33       ` Carlos O'Donell
  2020-07-07  2:57       ` Carlos O'Donell
  0 siblings, 2 replies; 12+ messages in thread
From: Szabolcs Nagy @ 2020-07-06 17:32 UTC (permalink / raw)
  To: Carlos O'Donell, Florian Weimer; +Cc: libc-alpha

The 07/06/2020 09:13, Carlos O'Donell wrote:
> On 6/22/20 12:21 PM, Szabolcs Nagy wrote:
> > TLS_STATIC_SURPLUS is 1664 bytes currently which is not enough to
> > support DL_NNS (== 16) number of dynamic link namespaces, if we
> > assume 192 bytes of TLS are reserved for libc use and 144 bytes
> > are reserved for other system libraries that use IE TLS.
> > 
> > A new tunable is introduced to control the number of supported
> > namespaces and to adjust the surplus static TLS size as follows:
> > 
> > surplus_tls = 192 * (rtld.nns-1) + 144 * rtld.nns + 512
> > 
> > The default is rtld.nns == 4 and then the surplus TLS size is the
> > same as before, so the behaviour is unchanged by default. If an
> > application creates more namespaces than the rtld.nns setting
> > allows, then it is not guaranteed to work, but the limit is not
> > checked. So existing usage will continue to work, but in the
> > future if an application creates more than 4 dynamic link
> > namespaces then the tunable will need to be set.
> > 
> > In this patch DL_NNS is a fixed value and provides a maximum to
> > the rtld.nns setting.
> > 
> > Static linking used fixed 2048 bytes surplus TLS, this is changed
> > so the same contract is used as for dynamic linking.  With static
> > linking DL_NNS == 1 so rtld.nns tunable is forced to 1, so by
> > default the surplus TLS is reduced to 144 + 512 = 656 bytes. This
> > change is not expected to cause problems.
> > 
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.
> 
> OK for master.
> 
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>

thanks for the review.

my commit message is no longer valid since the rseq change

-#define TLS_STATIC_SURPLUS     64 + DL_NNS * 100
+#define TLS_STATIC_SURPLUS     64 + DL_NNS * 176

i wonder if i should change my patch such that the default
surplus matches this new value (2880)?

e.g. rtld.nns default can be 5 instead of 4 and the optional
tls for optimizations can be 1024 instead of 512 bytes, that
would give 4*192+5*144+1024 = 2512 bytes surplus by default.

or should i override the rseq change and require users to use
rtld.nns? (e.g. use it in the auditmany tests?)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces
  2020-07-06 17:32     ` Szabolcs Nagy
@ 2020-07-06 18:33       ` Carlos O'Donell
  2020-07-06 18:37         ` Florian Weimer
  2020-07-07  2:57       ` Carlos O'Donell
  1 sibling, 1 reply; 12+ messages in thread
From: Carlos O'Donell @ 2020-07-06 18:33 UTC (permalink / raw)
  To: Szabolcs Nagy, Florian Weimer; +Cc: libc-alpha

On 7/6/20 1:32 PM, Szabolcs Nagy wrote:
> The 07/06/2020 09:13, Carlos O'Donell wrote:
>> On 6/22/20 12:21 PM, Szabolcs Nagy wrote:
>>> TLS_STATIC_SURPLUS is 1664 bytes currently which is not enough to
>>> support DL_NNS (== 16) number of dynamic link namespaces, if we
>>> assume 192 bytes of TLS are reserved for libc use and 144 bytes
>>> are reserved for other system libraries that use IE TLS.
>>>
>>> A new tunable is introduced to control the number of supported
>>> namespaces and to adjust the surplus static TLS size as follows:
>>>
>>> surplus_tls = 192 * (rtld.nns-1) + 144 * rtld.nns + 512
>>>
>>> The default is rtld.nns == 4 and then the surplus TLS size is the
>>> same as before, so the behaviour is unchanged by default. If an
>>> application creates more namespaces than the rtld.nns setting
>>> allows, then it is not guaranteed to work, but the limit is not
>>> checked. So existing usage will continue to work, but in the
>>> future if an application creates more than 4 dynamic link
>>> namespaces then the tunable will need to be set.
>>>
>>> In this patch DL_NNS is a fixed value and provides a maximum to
>>> the rtld.nns setting.
>>>
>>> Static linking used fixed 2048 bytes surplus TLS, this is changed
>>> so the same contract is used as for dynamic linking.  With static
>>> linking DL_NNS == 1 so rtld.nns tunable is forced to 1, so by
>>> default the surplus TLS is reduced to 144 + 512 = 656 bytes. This
>>> change is not expected to cause problems.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.
>>
>> OK for master.
>>
>> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> 
> thanks for the review.
> 
> my commit message is no longer valid since the rseq change

I see this with your patch:

FAIL: elf/tst-auditmany
original exit status 139

Program received signal SIGSEGV, Segmentation fault.
_dl_close_worker (map=<optimized out>, force=force@entry=true) at dl-close.c:785
785	      if (head->l_auditing == 0)
(gdb) bt
#0  _dl_close_worker (map=<optimized out>, force=force@entry=true) at dl-close.c:785
#1  0x00007ffff7fe5fc3 in _dl_open (file=0x7fffffffcba0 "\253\323z\366\377\177", 
    mode=<optimized out>, caller_dlopen=0x7ffff7fd4340 <dl_main>, nsid=9, argc=2, 
    argv=<optimized out>, env=0x7fffffffd1c8) at dl-open.c:906
#2  0x00007ffff7fd320e in dlmopen_doit (a=a@entry=0x7fffffffce50) at rtld.c:660
#3  0x00007ffff7fedfee in _dl_catch_exception (exception=exception@entry=0x7fffffffcd90, 
    operate=0x7ffff7fd31d0 <dlmopen_doit>, args=0x7fffffffce50) at dl-error-skeleton.c:208
#4  0x00007ffff7fee093 in _dl_catch_error (objname=0x7fffffffce40, errstring=0x7fffffffce48, 
    mallocedp=0x7fffffffce3f, operate=<optimized out>, args=<optimized out>)
    at dl-error-skeleton.c:227
#5  0x00007ffff7fd68e4 in load_audit_module (last_audit=<synthetic pointer>, 
    name=0x7fffffffcf28 "tst-auditmanymod9.so") at rtld.c:976
#6  load_audit_modules (audit_list=0x7fffffffce90, main_map=<optimized out>) at rtld.c:1117
#7  dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, 
    auxv=<optimized out>) at rtld.c:1682
#8  0x00007ffff7fecad9 in _dl_sysdep_start (start_argptr=<optimized out>, 
    dl_main=0x7ffff7fd4340 <dl_main>) at ../elf/dl-sysdep.c:252
#9  0x00007ffff7fd3e91 in _dl_start_final (arg=0x7fffffffd190) at rtld.c:489
#10 _dl_start (arg=0x7fffffffd190) at rtld.c:582
#11 0x00007ffff7fd3098 in _start () from /mnt/ssd/carlos/build/glibc-review/elf/ld.so
(gdb) 

(gdb) p (*args.map)->l_auditing
$4 = 0

This is the bug Florian ran into already?

https://sourceware.org/pipermail/libc-alpha/2020-June/114659.html

I believe this is because the dlopen of the audit module fails and we unwind
everything leaving an empty link namespace.

> -#define TLS_STATIC_SURPLUS     64 + DL_NNS * 100
> +#define TLS_STATIC_SURPLUS     64 + DL_NNS * 176
> 
> i wonder if i should change my patch such that the default
> surplus matches this new value (2880)?
>
> e.g. rtld.nns default can be 5 instead of 4 and the optional
> tls for optimizations can be 1024 instead of 512 bytes, that
> would give 4*192+5*144+1024 = 2512 bytes surplus by default.
> 
> or should i override the rseq change and require users to use
> rtld.nns? (e.g. use it in the auditmany tests?)
 
I would choose to keep 512 bytes, a lower value, and see how
things play out in userspace as this glibc is deployed downstream.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces
  2020-07-06 18:33       ` Carlos O'Donell
@ 2020-07-06 18:37         ` Florian Weimer
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Weimer @ 2020-07-06 18:37 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Szabolcs Nagy, libc-alpha

* Carlos O'Donell:

> This is the bug Florian ran into already?
>
> https://sourceware.org/pipermail/libc-alpha/2020-June/114659.html
>
> I believe this is because the dlopen of the audit module fails and we unwind
> everything leaving an empty link namespace.

Okay, I will resubmit the original patch with a different commit message
and comment.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces
  2020-07-06 17:32     ` Szabolcs Nagy
  2020-07-06 18:33       ` Carlos O'Donell
@ 2020-07-07  2:57       ` Carlos O'Donell
  2020-07-07 11:09         ` Szabolcs Nagy
  1 sibling, 1 reply; 12+ messages in thread
From: Carlos O'Donell @ 2020-07-07  2:57 UTC (permalink / raw)
  To: Szabolcs Nagy, Florian Weimer; +Cc: libc-alpha

On 7/6/20 1:32 PM, Szabolcs Nagy wrote:
> The 07/06/2020 09:13, Carlos O'Donell wrote:
>> On 6/22/20 12:21 PM, Szabolcs Nagy wrote:
>>> TLS_STATIC_SURPLUS is 1664 bytes currently which is not enough to
>>> support DL_NNS (== 16) number of dynamic link namespaces, if we
>>> assume 192 bytes of TLS are reserved for libc use and 144 bytes
>>> are reserved for other system libraries that use IE TLS.
>>>
>>> A new tunable is introduced to control the number of supported
>>> namespaces and to adjust the surplus static TLS size as follows:
>>>
>>> surplus_tls = 192 * (rtld.nns-1) + 144 * rtld.nns + 512
>>>
>>> The default is rtld.nns == 4 and then the surplus TLS size is the
>>> same as before, so the behaviour is unchanged by default. If an
>>> application creates more namespaces than the rtld.nns setting
>>> allows, then it is not guaranteed to work, but the limit is not
>>> checked. So existing usage will continue to work, but in the
>>> future if an application creates more than 4 dynamic link
>>> namespaces then the tunable will need to be set.
>>>
>>> In this patch DL_NNS is a fixed value and provides a maximum to
>>> the rtld.nns setting.
>>>
>>> Static linking used fixed 2048 bytes surplus TLS, this is changed
>>> so the same contract is used as for dynamic linking.  With static
>>> linking DL_NNS == 1 so rtld.nns tunable is forced to 1, so by
>>> default the surplus TLS is reduced to 144 + 512 = 656 bytes. This
>>> change is not expected to cause problems.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.
>>
>> OK for master.
>>
>> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> 
> thanks for the review.
> 
> my commit message is no longer valid since the rseq change
> 
> -#define TLS_STATIC_SURPLUS     64 + DL_NNS * 100
> +#define TLS_STATIC_SURPLUS     64 + DL_NNS * 176
> 
> i wonder if i should change my patch such that the default
> surplus matches this new value (2880)?
> 
> e.g. rtld.nns default can be 5 instead of 4 and the optional
> tls for optimizations can be 1024 instead of 512 bytes, that
> would give 4*192+5*144+1024 = 2512 bytes surplus by default.
> 
> or should i override the rseq change and require users to use
> rtld.nns? (e.g. use it in the auditmany tests?)
 
Your patch here + rseq now cause this, and I see it when doing
CI testing of your patches (both applied + Florian's fix to
prevent the segfault):

FAIL: elf/tst-auditmany

The audit module load failure is ignored as expected:
ERROR: ld.so: object 'tst-auditmanymod9.so' cannot be loaded as audit interface: cannot allocate memory in static TLS block; ignored.

However, we then subsequently fail to load libc.so.6 in the
main application because we run out of static TLS:
elf/tst-auditmany: error while loading shared libraries: /home/carlos/build/glibc-review/libc.so.6: cannot allocate memory in static TLS block

So I think we *do* have to bump this up a bit.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces
  2020-07-07  2:57       ` Carlos O'Donell
@ 2020-07-07 11:09         ` Szabolcs Nagy
  2020-07-07 12:38           ` Carlos O'Donell
  0 siblings, 1 reply; 12+ messages in thread
From: Szabolcs Nagy @ 2020-07-07 11:09 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Florian Weimer, libc-alpha

The 07/06/2020 22:57, Carlos O'Donell wrote:
> On 7/6/20 1:32 PM, Szabolcs Nagy wrote:
> > my commit message is no longer valid since the rseq change
> > 
> > -#define TLS_STATIC_SURPLUS     64 + DL_NNS * 100
> > +#define TLS_STATIC_SURPLUS     64 + DL_NNS * 176
> > 
> > i wonder if i should change my patch such that the default
> > surplus matches this new value (2880)?
> > 
> > e.g. rtld.nns default can be 5 instead of 4 and the optional
> > tls for optimizations can be 1024 instead of 512 bytes, that
> > would give 4*192+5*144+1024 = 2512 bytes surplus by default.
> > 
> > or should i override the rseq change and require users to use
> > rtld.nns? (e.g. use it in the auditmany tests?)
>  
> Your patch here + rseq now cause this, and I see it when doing
> CI testing of your patches (both applied + Florian's fix to
> prevent the segfault):
> 
> FAIL: elf/tst-auditmany
> 
> The audit module load failure is ignored as expected:
> ERROR: ld.so: object 'tst-auditmanymod9.so' cannot be loaded as audit interface: cannot allocate memory in static TLS block; ignored.
> 
> However, we then subsequently fail to load libc.so.6 in the
> main application because we run out of static TLS:
> elf/tst-auditmany: error while loading shared libraries: /home/carlos/build/glibc-review/libc.so.6: cannot allocate memory in static TLS block
> 
> So I think we *do* have to bump this up a bit.

i'm testing a separate patch that uses

 surplus_tls = 192*(nns-1) + 144*nns + 512

with nns = rtld.nns + audit modules.

so audit modules implicitly increase nns (until it hits
DL_NNS which is currently a fatal error).

the change does not look very intrusive to me and should
fix elf/tst-auditmany.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces
  2020-07-07 11:09         ` Szabolcs Nagy
@ 2020-07-07 12:38           ` Carlos O'Donell
  0 siblings, 0 replies; 12+ messages in thread
From: Carlos O'Donell @ 2020-07-07 12:38 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Florian Weimer, libc-alpha

On 7/7/20 7:09 AM, Szabolcs Nagy wrote:
> The 07/06/2020 22:57, Carlos O'Donell wrote:
>> On 7/6/20 1:32 PM, Szabolcs Nagy wrote:
>>> my commit message is no longer valid since the rseq change
>>>
>>> -#define TLS_STATIC_SURPLUS     64 + DL_NNS * 100
>>> +#define TLS_STATIC_SURPLUS     64 + DL_NNS * 176
>>>
>>> i wonder if i should change my patch such that the default
>>> surplus matches this new value (2880)?
>>>
>>> e.g. rtld.nns default can be 5 instead of 4 and the optional
>>> tls for optimizations can be 1024 instead of 512 bytes, that
>>> would give 4*192+5*144+1024 = 2512 bytes surplus by default.
>>>
>>> or should i override the rseq change and require users to use
>>> rtld.nns? (e.g. use it in the auditmany tests?)
>>  
>> Your patch here + rseq now cause this, and I see it when doing
>> CI testing of your patches (both applied + Florian's fix to
>> prevent the segfault):
>>
>> FAIL: elf/tst-auditmany
>>
>> The audit module load failure is ignored as expected:
>> ERROR: ld.so: object 'tst-auditmanymod9.so' cannot be loaded as audit interface: cannot allocate memory in static TLS block; ignored.
>>
>> However, we then subsequently fail to load libc.so.6 in the
>> main application because we run out of static TLS:
>> elf/tst-auditmany: error while loading shared libraries: /home/carlos/build/glibc-review/libc.so.6: cannot allocate memory in static TLS block
>>
>> So I think we *do* have to bump this up a bit.
> 
> i'm testing a separate patch that uses
> 
>  surplus_tls = 192*(nns-1) + 144*nns + 512
> 
> with nns = rtld.nns + audit modules.
> 
> so audit modules implicitly increase nns (until it hits
> DL_NNS which is currently a fatal error).
> 
> the change does not look very intrusive to me and should
> fix elf/tst-auditmany.
 
Awesome. Do you think you we can fix this by the end of
the week? It would be good to finally fix this issue and
also meet the 2.32 schedule.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-07 12:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-22 16:20 [PATCH v5 0/2] Improve surplus TLS accounting Szabolcs Nagy
2020-06-22 16:21 ` [PATCH v5 1/2] rtld: Add rtld.nns tunable for the number of supported namespaces Szabolcs Nagy
2020-07-06 13:13   ` Carlos O'Donell
2020-07-06 17:32     ` Szabolcs Nagy
2020-07-06 18:33       ` Carlos O'Donell
2020-07-06 18:37         ` Florian Weimer
2020-07-07  2:57       ` Carlos O'Donell
2020-07-07 11:09         ` Szabolcs Nagy
2020-07-07 12:38           ` Carlos O'Donell
2020-06-22 16:21 ` [PATCH v5 2/2] rtld: Avoid using up static TLS surplus for optimizations [BZ #25051] Szabolcs Nagy
2020-07-06 13:19   ` Carlos O'Donell
2020-06-26 10:50 ` [PATCH v5 0/2] Improve surplus TLS accounting Szabolcs Nagy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).