* LD_AUDIT: Not enough space in static TLS block @ 2022-04-11 20:24 Jonathon Anderson 2022-04-12 7:44 ` Florian Weimer 0 siblings, 1 reply; 7+ messages in thread From: Jonathon Anderson @ 2022-04-11 20:24 UTC (permalink / raw) To: Carlos O'Donell, Florian Weimer, Ben Woodard, Adhemerval Zanella, Legendre, Matthew P. Cc: libc-alpha, John Mellor-Crummey Hello all, We (the HPCToolkit team) have encountered another critical LD_AUDIT bug. When LD_AUDIT is specified, the allocation of the static TLS block does not account for the TLS requirements of executable dependencies or of the auditors themselves. If: - an executable accesses a thread-local variable in a linked library with sufficiently large TLS requirements, or - an auditor itself uses sufficiently large TLS and optimizes access with `-ftls-model=initial-exec`, then the process or auditor will fail with the error "cannot allocate memory in static TLS block." This is a critical issue for us. We have observed this issue affecting RAJA, a template-based library for efficient parallel computation and widely-used among HPC applications. It would help us greatly if this issue was fixed for 2.36 and backported along with the other LD_AUDIT-related patches. We have added this issue and a minimal reproducer to our document of auditor bugs: https://docs.google.com/document/d/1dVaDBdzySecxQqD6hLLzDrEF18M1UtjDna9gL5BWWI0 -Jonathon ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: LD_AUDIT: Not enough space in static TLS block 2022-04-11 20:24 LD_AUDIT: Not enough space in static TLS block Jonathon Anderson @ 2022-04-12 7:44 ` Florian Weimer 2022-05-03 7:22 ` Florian Weimer 0 siblings, 1 reply; 7+ messages in thread From: Florian Weimer @ 2022-04-12 7:44 UTC (permalink / raw) To: Jonathon Anderson Cc: Carlos O'Donell, Ben Woodard, Adhemerval Zanella, Legendre, Matthew P., libc-alpha, John Mellor-Crummey * Jonathon Anderson: > Hello all, > > We (the HPCToolkit team) have encountered another critical LD_AUDIT > bug. When LD_AUDIT is specified, the allocation of the static TLS > block does not account for the TLS requirements of executable > dependencies or of the auditors themselves. If: > - an executable accesses a thread-local variable in a linked library > with sufficiently large TLS requirements, or > - an auditor itself uses sufficiently large TLS and optimizes access > with `-ftls-model=initial-exec`, > > then the process or auditor will fail with the error "cannot allocate > memory in static TLS block." We have a tunable that can be used as a workaround. Your reproducer passes for me with our 2.28 backport (glibc-2.28-164.el8) if I run it like this: GLIBC_TUNABLES=glibc.rtld.optional_static_tls=4000 make The best we can do in the short term would be an increase of the default limit. On 64-bit platforms, defaulting to a dozen or so kilobytes per thread should not be a problem as far as virtual address space consumption is concerned. We can also add an additional reservation of similar size for every auditor that is loaded, to compensate for the lack of auto-tuning of the TLS allocation size in auditing mode. The fundamental issue is that there is always going to be a hard limit for initial-exec TLS. Initial-exec TLS requires a fixed offset from the thread pointer, and we cannot relocate TLS variables because they are ordinary C objects with an observable address. There are some other things we can try to improve auto-tuning, but in the end, there is always going to be a fixed-size reserved area dedicated to initial-exec TLS set up at process startup, and with dlopen, that might not be enough even without any auditor use. Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: LD_AUDIT: Not enough space in static TLS block 2022-04-12 7:44 ` Florian Weimer @ 2022-05-03 7:22 ` Florian Weimer 2022-05-05 17:30 ` Florian Weimer 0 siblings, 1 reply; 7+ messages in thread From: Florian Weimer @ 2022-05-03 7:22 UTC (permalink / raw) To: Jonathon Anderson Cc: Carlos O'Donell, Ben Woodard, Adhemerval Zanella, Legendre, Matthew P., libc-alpha, John Mellor-Crummey * Florian Weimer: > * Jonathon Anderson: > >> Hello all, >> >> We (the HPCToolkit team) have encountered another critical LD_AUDIT >> bug. When LD_AUDIT is specified, the allocation of the static TLS >> block does not account for the TLS requirements of executable >> dependencies or of the auditors themselves. If: >> - an executable accesses a thread-local variable in a linked library >> with sufficiently large TLS requirements, or >> - an auditor itself uses sufficiently large TLS and optimizes access >> with `-ftls-model=initial-exec`, >> >> then the process or auditor will fail with the error "cannot allocate >> memory in static TLS block." > > We have a tunable that can be used as a workaround. Your reproducer > passes for me with our 2.28 backport (glibc-2.28-164.el8) if I run it > like this: > > GLIBC_TUNABLES=glibc.rtld.optional_static_tls=4000 make > > The best we can do in the short term would be an increase of the default > limit. On 64-bit platforms, defaulting to a dozen or so kilobytes per > thread should not be a problem as far as virtual address space > consumption is concerned. We can also add an additional reservation of > similar size for every auditor that is loaded, to compensate for the > lack of auto-tuning of the TLS allocation size in auditing mode. > > The fundamental issue is that there is always going to be a hard limit > for initial-exec TLS. Initial-exec TLS requires a fixed offset from the > thread pointer, and we cannot relocate TLS variables because they are > ordinary C objects with an observable address. There are some other > things we can try to improve auto-tuning, but in the end, there is > always going to be a fixed-size reserved area dedicated to initial-exec > TLS set up at process startup, and with dlopen, that might not be enough > even without any auditor use. Jonathon, does setting the environment variable work for you? Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: LD_AUDIT: Not enough space in static TLS block 2022-05-03 7:22 ` Florian Weimer @ 2022-05-05 17:30 ` Florian Weimer 2022-05-05 19:56 ` Jonathon Anderson 0 siblings, 1 reply; 7+ messages in thread From: Florian Weimer @ 2022-05-05 17:30 UTC (permalink / raw) To: Jonathon Anderson Cc: Carlos O'Donell, Ben Woodard, Adhemerval Zanella, Legendre, Matthew P., libc-alpha, John Mellor-Crummey * Florian Weimer: > * Florian Weimer: > >> * Jonathon Anderson: >> >>> Hello all, >>> >>> We (the HPCToolkit team) have encountered another critical LD_AUDIT >>> bug. When LD_AUDIT is specified, the allocation of the static TLS >>> block does not account for the TLS requirements of executable >>> dependencies or of the auditors themselves. If: >>> - an executable accesses a thread-local variable in a linked library >>> with sufficiently large TLS requirements, or >>> - an auditor itself uses sufficiently large TLS and optimizes access >>> with `-ftls-model=initial-exec`, >>> >>> then the process or auditor will fail with the error "cannot allocate >>> memory in static TLS block." >> >> We have a tunable that can be used as a workaround. Your reproducer >> passes for me with our 2.28 backport (glibc-2.28-164.el8) if I run it >> like this: >> >> GLIBC_TUNABLES=glibc.rtld.optional_static_tls=4000 make >> >> The best we can do in the short term would be an increase of the default >> limit. On 64-bit platforms, defaulting to a dozen or so kilobytes per >> thread should not be a problem as far as virtual address space >> consumption is concerned. We can also add an additional reservation of >> similar size for every auditor that is loaded, to compensate for the >> lack of auto-tuning of the TLS allocation size in auditing mode. >> >> The fundamental issue is that there is always going to be a hard limit >> for initial-exec TLS. Initial-exec TLS requires a fixed offset from the >> thread pointer, and we cannot relocate TLS variables because they are >> ordinary C objects with an observable address. There are some other >> things we can try to improve auto-tuning, but in the end, there is >> always going to be a fixed-size reserved area dedicated to initial-exec >> TLS set up at process startup, and with dlopen, that might not be enough >> even without any auditor use. > > Jonathon, > > does setting the environment variable work for you? Do you have any additional feedback here? In the meantime, we have updated Fedora rawhide with the bug fix to enable early <dlfcn.h> usage from auditors, and the new RTLD_DI_PHDR dlinfo is included as well. If you could test glibc-2.35.9000-16 or later, that would be great. Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: LD_AUDIT: Not enough space in static TLS block 2022-05-05 17:30 ` Florian Weimer @ 2022-05-05 19:56 ` Jonathon Anderson 2022-05-11 13:59 ` Florian Weimer 0 siblings, 1 reply; 7+ messages in thread From: Jonathon Anderson @ 2022-05-05 19:56 UTC (permalink / raw) To: Florian Weimer Cc: Carlos O'Donell, Ben Woodard, Adhemerval Zanella, Legendre, Matthew P., libc-alpha, John Mellor-Crummey On 5/5/22 12:30, Florian Weimer wrote: > * Florian Weimer: > >> * Florian Weimer: >> >>> * Jonathon Anderson: >>> >>>> Hello all, >>>> >>>> We (the HPCToolkit team) have encountered another critical LD_AUDIT >>>> bug. When LD_AUDIT is specified, the allocation of the static TLS >>>> block does not account for the TLS requirements of executable >>>> dependencies or of the auditors themselves. If: >>>> - an executable accesses a thread-local variable in a linked library >>>> with sufficiently large TLS requirements, or >>>> - an auditor itself uses sufficiently large TLS and optimizes access >>>> with `-ftls-model=initial-exec`, >>>> >>>> then the process or auditor will fail with the error "cannot allocate >>>> memory in static TLS block." >>> We have a tunable that can be used as a workaround. Your reproducer >>> passes for me with our 2.28 backport (glibc-2.28-164.el8) if I run it >>> like this: >>> >>> GLIBC_TUNABLES=glibc.rtld.optional_static_tls=4000 make >>> >>> The best we can do in the short term would be an increase of the default >>> limit. On 64-bit platforms, defaulting to a dozen or so kilobytes per >>> thread should not be a problem as far as virtual address space >>> consumption is concerned. We can also add an additional reservation of >>> similar size for every auditor that is loaded, to compensate for the >>> lack of auto-tuning of the TLS allocation size in auditing mode. >>> >>> The fundamental issue is that there is always going to be a hard limit >>> for initial-exec TLS. Initial-exec TLS requires a fixed offset from the >>> thread pointer, and we cannot relocate TLS variables because they are >>> ordinary C objects with an observable address. There are some other >>> things we can try to improve auto-tuning, but in the end, there is >>> always going to be a fixed-size reserved area dedicated to initial-exec >>> TLS set up at process startup, and with dlopen, that might not be enough >>> even without any auditor use. >> Jonathon, >> >> does setting the environment variable work for you? > Do you have any additional feedback here? Sorry for the delayed response (it's ECP conference week). *This tunable works for us as a stopgap until a long-term solution can be implemented.* I had a separate (email) chat with Ben Woodard bouncing ideas for a long-term solution. A major difficulty is that LD_AUDIT currently introduces a cyclic dependency: - auditors must be loaded before searching for the application's dependencies (since la_objsearch may modify the results), and - dependency searches must complete before the static TLS auto-tuning (since the TLS sizes of the initial link-map must be known), but - the static TLS block must be allocated before auditors are loaded (since auditors may also use initial-exec TLS). So, I'm not hopeful for a long-term solution that does not involve another LAV_CURRENT bump. We (me and Ben) came up with a couple of initial solutions: disallowing initial-exec TLS in auditors, or per-auditor static TLS blocks (ie. TLS namespaces). Comments and ideas are welcome. (I would love to have a detailed LD_AUDIT discussion at STW in June.) > In the meantime, we have updated Fedora rawhide with the bug fix to > enable early <dlfcn.h> usage from auditors, and the new RTLD_DI_PHDR > dlinfo is included as well. If you could test glibc-2.35.9000-16 or > later, that would be great. Thanks! Our reproducer for the early dl* bug passes with the latest Fedora Rawhide, I'll look into using RTLD_DI_PHDR in HPCToolkit in the coming weeks. -Jonathon ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: LD_AUDIT: Not enough space in static TLS block 2022-05-05 19:56 ` Jonathon Anderson @ 2022-05-11 13:59 ` Florian Weimer 2022-05-11 17:31 ` Jonathon Anderson 0 siblings, 1 reply; 7+ messages in thread From: Florian Weimer @ 2022-05-11 13:59 UTC (permalink / raw) To: Jonathon Anderson Cc: Carlos O'Donell, Ben Woodard, Adhemerval Zanella, Legendre, Matthew P., libc-alpha, John Mellor-Crummey * Jonathon Anderson: > This tunable works for us as a stopgap until a long-term solution can be > implemented. Good to know, thanks. > I had a separate (email) chat with Ben Woodard bouncing ideas for a long-term solution. A > major difficulty is that LD_AUDIT currently introduces a cyclic dependency: > - auditors must be loaded before searching for the application's dependencies (since > la_objsearch may modify the results), and > - dependency searches must complete before the static TLS auto-tuning (since the TLS > sizes of the initial link-map must be known), but > - the static TLS block must be allocated before auditors are loaded (since auditors may also > use initial-exec TLS). > > So, I'm not hopeful for a long-term solution that does not involve > another LAV_CURRENT bump. We (me and Ben) came up with a couple of > initial solutions: disallowing initial-exec TLS in auditors, I'm not sure if this feasible. It would mean we cannot use initial-exec TLS in glibc at all, or in libstdc++ (in case auditors are written in C++). And we don't want to build libraries twice (for auditor usage). > or per-auditor static TLS blocks (ie. TLS namespaces). We already have that, but there is just one thread pointer, so that does not solve the problem. Using a secondary thread pointer has the second build problem, too. Auditor TLS usage has a conceptually simple fix, though. (It's simple in concept, but implementation requires some refactoring.) Recall that for regular process startup (without auditing), we do this: (1) map the main executable (the kernel may do this for us) (2) recursively map all the dependencies (3) calculate static TLS usage (4) allocate static TLS space (5) perform relocation (6) assign TLS variables their initial values (7) start running user code (initializers, main) Once auditors are in the mix, we do this instead: (1) guess static TLS usage (2) allocate static TLS space (3) load each audit module individual, in sequence, as if per dlmopen: (3.1) map the auditor (3.2) recursively map all its dependencies (3.3) perform relocation (3.4) calculate and allocate static TLS space (from the global area) (3.5) assign TLS variables their initial values (3.6) start running auditor code (ELF constructors, la_version) (4) map the main executable (the kernel may do this for us) (5) recursively map all the dependencies (may involve la_objsearch) (6) calculate and allocate static TLS space (from the global area) (7) perform relocation (8) assign TLS variables their initial values (9) start running user code (initializers, main) Step (1) is the big problem here, it's just a quick hack to get things going with TLS, but it has been around for a long time. What we should be doing instead is this: (1) load each audit module individual, in sequence (no relocation here): (1.1) map the auditor (1.2) recursively map all its dependencies (1.3) calculate static TLS usage for this auditor namespace (2) map the main executable (the kernel may do this for us) (3) calculate static TLS usage using all TLS size information seen so far (4) allocate static TLS space (5) complete loading the auditors (relocation and startup): (5.1) perform relocation (5.2) calculate and allocate static TLS space (from the global area) (5.3) assign TLS variables their initial values (5.4) start running auditor code (ELF constructors, la_version) (6) recursively map all the dependencies (may involve la_objsearch) (7) calculate and allocate static TLS space (from the global area) (8) perform relocation (9) assign TLS variables their initial values (10) start running user code (initializers, main) With this sequence, direct static TLS usage from auditors is taken into account for the fixed-size TLS allocation at (4), eliminating the guesswork. (Step (2) could actually come right before step (6), it would not alter the picture.) When no auditor defines la_objsearch, we can do even better and map the executable and its dependencies before computing the static TLS size, and only run step (5), complete loading the auditors, after mapping everything (but before relocation, which needs working auditors for la_symbind). In this case, we'd have the same level of information regarding TLS usage as in the non-auditor case (which is still not enough in all cases, but another incremental improvement). With la_objsearch, we could pull a few more tricks. Auditors could advertise that the address of their TLS variables do not matter, which would enable us to relocate the TLS space as we discover more objects that need more static TLS. Or we could unload the auditors on TLS exhaustion and start again with a larger space esimate. Auditors could provide their own guesses for static TLS usage that we query upfront and take into account for the size calculation. None of this solves the general dlopen case, though. I have some ideas for that, which boils down to “just provide enough address space during early startup, so that you never exceed it until the initialization phase with dlopen is complete”. This needs a new TCB allocator, though, so it's also quite involved to implement. It does not solve the problem completely, but I expect that it will eliminate pretty much all shortcomings of initial-exec TLS we have seen in practice. With that change, we might not even need the two-phased auditor loading. > Comments and ideas are welcome. (I would love to have a detailed > LD_AUDIT discussion at STW in June.) Uhm, what's STW? > Our reproducer for the early dl* bug passes with the latest Fedora > Rawhide, I'll look into using RTLD_DI_PHDR in HPCToolkit in the coming > weeks. Thanks! Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: LD_AUDIT: Not enough space in static TLS block 2022-05-11 13:59 ` Florian Weimer @ 2022-05-11 17:31 ` Jonathon Anderson 0 siblings, 0 replies; 7+ messages in thread From: Jonathon Anderson @ 2022-05-11 17:31 UTC (permalink / raw) To: Florian Weimer Cc: Carlos O'Donell, Ben Woodard, Adhemerval Zanella, Legendre, Matthew P., libc-alpha, John Mellor-Crummey On 5/11/22 08:59, Florian Weimer wrote: >> I had a separate (email) chat with Ben Woodard bouncing ideas for a long-term solution. A >> major difficulty is that LD_AUDIT currently introduces a cyclic dependency: >> - auditors must be loaded before searching for the application's dependencies (since >> la_objsearch may modify the results), and >> - dependency searches must complete before the static TLS auto-tuning (since the TLS >> sizes of the initial link-map must be known), but >> - the static TLS block must be allocated before auditors are loaded (since auditors may also >> use initial-exec TLS). >> >> So, I'm not hopeful for a long-term solution that does not involve >> another LAV_CURRENT bump. We (me and Ben) came up with a couple of >> initial solutions: disallowing initial-exec TLS in auditors, I wasn't very clear with my half-sentence descriptions, let me add some more detail to these ideas. > I'm not sure if this feasible. It would mean we cannot use initial-exec > TLS in glibc at all, or in libstdc++ (in case auditors are written in > C++). > > And we don't want to build libraries twice (for auditor usage). I believe this can be avoided if they're compiled with the gnu2 TLS variant, that would allow ld.so to relocate so that: - the Glibc in the auditor namespace(s) uses a dynamic TLS segment (with a performance hit, of course), but - the Glibc in the main namespace(s) uses the static TLS segment (with no measurable performance degradation). This extends to other libraries as well, libstdc++ included. On the ld.so side, the startup sequence would then become something like this: (1) allocate TCB and DTV (but not static TLS space) (2) load each audit module (2.1) map the auditor (2.2) recursively map all its dependencies (2.3) relocate, where: (2.3.1) TLSDESC is always relocated in dynamic form (2.3.2) any initial-exec relocation (eg. R_X86_64_TPOFF64) throws an error (2.4) assign TLS variables from their initial values (2.5) start running auditor code (ELF constructors, la_version) (3) map the main executable (4) recursively map all dependencies (5) calculate static TLS usage (6) allocate static TLS + new TCB (7) move old TCB to new location (8) relocate (9) assign TLS variables their initial values (10) start running user code (initializers, main) Where steps (1-2) and (7) are skipped in the non-auditor case. (Of course, this idea leaves us in the unenviable position of negotiating with our dependencies to compile everything with gnu2 TLS. Caveat emptor.) >> or per-auditor static TLS blocks (ie. TLS namespaces). > We already have that, but there is just one thread pointer, so that does > not solve the problem. Using a secondary thread pointer has the second > build problem, too. In this idea the auditors would be responsible for "switching" between the multiple thread pointers, where all non-auditor namespaces share one "main" thread pointer. The audit API would need to be expanded with a few function(-like macros) to manipulate the thread pointer, such as: - struct tls_restore tls_switch_to_main_tp(); - struct tls_restore tls_switch_to_caller_tp(); - void tls_restore_tp(const struct tls_storage*); Then auditors (and ld.so) would need to implement a series of restrictions/conventions to properly maintain TP, one set of rules could be: - All la_* notification calls occur with the auditor's thread pointer set. - All calls to code outside the auditor namespace (eg. user code) must occur with the main thread pointer set (ie. wrapped in a tls_switch_to_main_tp()/tls_restore_tp() pair). - All calls from outside the auditor namespace (eg. wrapped symbols) occur with the main thread pointer set (ie. the auditor should wrap the contents in a tls_switch_to_caller_tp()/tls_restore_tp() pair). (Of course, any bugs in any auditor's TP management will cause very subtle errors in the application, and auditors will need to be careful that their dependencies don't naively call code loaded via dlmopen(LM_ID_BASE). Caveat emptor.) > Auditor TLS usage has a conceptually simple fix, though. (It's simple > in concept, but implementation requires some refactoring.) Recall that > for regular process startup (without auditing), we do this: > > (1) map the main executable (the kernel may do this for us) > (2) recursively map all the dependencies > (3) calculate static TLS usage > (4) allocate static TLS space > (5) perform relocation > (6) assign TLS variables their initial values > (7) start running user code (initializers, main) > > Once auditors are in the mix, we do this instead: > > (1) guess static TLS usage > (2) allocate static TLS space > (3) load each audit module individual, in sequence, as if per dlmopen: > (3.1) map the auditor > (3.2) recursively map all its dependencies > (3.3) perform relocation > (3.4) calculate and allocate static TLS space (from the global area) > (3.5) assign TLS variables their initial values > (3.6) start running auditor code (ELF constructors, la_version) > (4) map the main executable (the kernel may do this for us) > (5) recursively map all the dependencies (may involve la_objsearch) > (6) calculate and allocate static TLS space (from the global area) > (7) perform relocation > (8) assign TLS variables their initial values > (9) start running user code (initializers, main) > > Step (1) is the big problem here, it's just a quick hack to get things > going with TLS, but it has been around for a long time. What we should > be doing instead is this: > > (1) load each audit module individual, in sequence (no relocation here): > (1.1) map the auditor > (1.2) recursively map all its dependencies > (1.3) calculate static TLS usage for this auditor namespace > (2) map the main executable (the kernel may do this for us) > (3) calculate static TLS usage using all TLS size information seen so far > (4) allocate static TLS space > (5) complete loading the auditors (relocation and startup): > (5.1) perform relocation > (5.2) calculate and allocate static TLS space (from the global area) > (5.3) assign TLS variables their initial values > (5.4) start running auditor code (ELF constructors, la_version) > (6) recursively map all the dependencies (may involve la_objsearch) > (7) calculate and allocate static TLS space (from the global area) > (8) perform relocation > (9) assign TLS variables their initial values > (10) start running user code (initializers, main) > > With this sequence, direct static TLS usage from auditors is taken into > account for the fixed-size TLS allocation at (4), eliminating the > guesswork. (Step (2) could actually come right before step (6), it > would not alter the picture.) Step (3) doesn't account for the static TLS usage of the executable's dependencies, so this doesn't completely solve the issue. (The cyclic dependency I mentioned before is roughly (3) -> (4) -> (5.4) -> (6) -> (3).) > When no auditor defines la_objsearch, we can do even better and map the > executable and its dependencies before computing the static TLS size, > and only run step (5), complete loading the auditors, after mapping > everything (but before relocation, which needs working auditors for > la_symbind). In this case, we'd have the same level of information > regarding TLS usage as in the non-auditor case (which is still not > enough in all cases, but another incremental improvement). > With la_objsearch, we could pull a few more tricks. Auditors could > advertise that the address of their TLS variables do not matter, which > would enable us to relocate the TLS space as we discover more objects > that need more static TLS. If I understand correctly, this optimization would require that the auditor and all its dependencies don't take the address of TLS variables. This won't be feasible for us (we have complex dependencies), to be used by any reasonable auditor this restriction would have to be satisfied by (at least) Glibc and libstdc++. Compiler assistance would help here, although I highly doubt this restriction will be often achieved with C++ code (eg. returning a const reference to a TLS variable is enough to break this restriction). > Or we could unload the auditors on TLS > exhaustion and start again with a larger space esimate. We would need some additions to the auditor API to indicate when this is happening (vs. normal program termination) and to move data to the reloaded instance. This would also require significant violence to our measurement infrastructure to support save/reload like this, so this is not really a preferable solution. > Auditors could > provide their own guesses for static TLS usage that we query upfront and > take into account for the size calculation. Is this different from setting the tunable (except with a fancier interface)? > None of this solves the general dlopen case, though. I have some ideas > for that, which boils down to “just provide enough address space during > early startup, so that you never exceed it until the initialization > phase with dlopen is complete”. This needs a new TCB allocator, though, > so it's also quite involved to implement. It does not solve the problem > completely, but I expect that it will eliminate pretty much all > shortcomings of initial-exec TLS we have seen in practice. > > With that change, we might not even need the two-phased auditor loading. I'm not sure I understand this solution, would this require the static TLS block to grow as new libraries are loaded during ELF constructors? Or for the entire execution? Would this then mean that any library can use initial-exec TLS, regardless of whether it's part of the initial link map? It sounds magical, but if it's possible it would definitely solve the issue. >> Comments and ideas are welcome. (I would love to have a detailed >> LD_AUDIT discussion at STW in June.) > Uhm, what's STW? As Dr. Mellor-Crummey corrected me, it's the Scalable Tools Workshop (https://dyninst.github.io/scalable_tools_workshop/petascale2022/). Force of habit using the acronym, sorry for the confusion. -Jonathon ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-05-11 17:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-11 20:24 LD_AUDIT: Not enough space in static TLS block Jonathon Anderson 2022-04-12 7:44 ` Florian Weimer 2022-05-03 7:22 ` Florian Weimer 2022-05-05 17:30 ` Florian Weimer 2022-05-05 19:56 ` Jonathon Anderson 2022-05-11 13:59 ` Florian Weimer 2022-05-11 17:31 ` Jonathon Anderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).