public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Accessing TLS variable defined in DSO with static tls model from another DSO with dynamic tls model
@ 2021-05-27 23:19 Ho, Lennox
  2021-05-28 13:44 ` Alexander Monakov
  0 siblings, 1 reply; 5+ messages in thread
From: Ho, Lennox @ 2021-05-27 23:19 UTC (permalink / raw)
  To: gcc-help

Hi!

So this is what I'm trying to figure out.

I have a TLS variable defined and exported in libtls_export.so. 
libtls_export.so is built with -fPIC and -ftls-model=initial-exec.
This means the TLS variable in question should always be located at a constant relative offset (resolved and patched during program startup) from the thread pointer (stored in the FS register).

Now I have another shared library libtls_import.so.
I want to access the TLS variable exported by libtls_export.so in this DSO.
I do not want to build libtls_import.so with the initial-exec TLS model (I want to allow dlopen()-ing libtls_import.so)

Here's a minimal example:

// tls.h

extern __thread int my_tls;

// tls_export.c
// gcc -shared -fPIC -O2 -ftls-model=initial-exec -o libtls_export.so tls_export.c

#include "tls.h"
__thread int my_tls = 0;

// tls_import.c
// gcc -shared -fPIC -O2 -o libtls_import.so tls_import.c

#include "tls.h"
int get_my_tls() {
    return my_tls;
}

While the compiler is building libtls_import.so, it has no knowledge that my_tls will always be located in a static TLS segment.
Therefore, it will emit __tls_get_addr(), which is less than ideal in terms of performance.

So my question is, is there a way to inform gcc that my_tls is always located in a static TLS segment and so it can just emit a @gottpoff placeholder instead of __tls_get_addr() in libtls_import.so?

A hacky workaround I've thought of is to calculate and cache the relative offset of the TLS variable (&my_tls - %FS essentially) and then perform our own lookup in libtls_import.so.
This should? be safe if we make sure libtls_export.so is built with the initial-exec model.

Another interesting observation is "readelf -a libtls_import.so | grep STATIC_TLS" yields no results.
So I wonder if I'm fundamentally misunderstanding how static TLS works.

Thanks,
Len

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Accessing TLS variable defined in DSO with static tls model from another DSO with dynamic tls model
  2021-05-27 23:19 Accessing TLS variable defined in DSO with static tls model from another DSO with dynamic tls model Ho, Lennox
@ 2021-05-28 13:44 ` Alexander Monakov
  2021-05-28 18:02   ` Ho, Lennox
  0 siblings, 1 reply; 5+ messages in thread
From: Alexander Monakov @ 2021-05-28 13:44 UTC (permalink / raw)
  To: Ho, Lennox; +Cc: gcc-help

On Thu, 27 May 2021, Ho, Lennox via Gcc-help wrote:

> So my question is, is there a way to inform gcc that my_tls is always located
> in a static TLS segment and so it can just emit a @gottpoff placeholder
> instead of __tls_get_addr() in libtls_import.so?

You need to pass -ftls-model=initial-exec when compiling libtls_import.so, not
the exporting library. I am a bit surprised you've decided to do that the other
way around :)

(alternatively, you could place __attribute__((tls_model("initial-exec")))
on the declaration of the variable in the header file)

Alexander

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Accessing TLS variable defined in DSO with static tls model from another DSO with dynamic tls model
  2021-05-28 13:44 ` Alexander Monakov
@ 2021-05-28 18:02   ` Ho, Lennox
  2021-05-29  9:16     ` Alexander Monakov
  0 siblings, 1 reply; 5+ messages in thread
From: Ho, Lennox @ 2021-05-28 18:02 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-help

> You need to pass -ftls-model=initial-exec when compiling libtls_import.so, not
> the exporting library. I am a bit surprised you've decided to do that the other
> way around :)

Ahh ok so it does appear that I really misunderstood how static TLS works :/.

My assumption has been that since (static) TLS variables are placed in the static TLS segment, at a constant offset (to the left) from the thread pointer (as shown here https://th.bing.com/th/id/R92210a8cb7df88cb1b9586ce73d8da90?rik=FX6vqTs2uG%2brzw&pid=ImgRaw), 
that initial-exec/local-exec is an imperative on the TLS data itself, not on the act of *accessing* said data.

While tagging the DSO that accesses the TLS data (libtls_export.so) with DF_STATIC_TLS will do the trick (libtls_export.so must be loaded before libtls_import.so - which must be loaded before main() - and so the TLS in libtls_export.so can be placed in a static TLS section),
surely tagging the export DSO (libtls_export.so) with DF_STATIC_TLS instead would be more natural?
My understanding is that the dynamic loader will have the opportunity to fixup the offsets in any future DSO that is loaded.

Do you mind elaborating why DF_STATIC_TLS is placed on the client/import DSO and not the export DSO?
Is there something that I'm still missing?

-------------------

To offer some additional context, I have an unfair reader-writer mutex (unfair in that readers are heavily favoured) implementation that uses TLS variables to indicate whether a thread is currently "read-locking".
There are a few other bit and pieces required to make this work, but the advantage of this approach is we completely eliminate cache contention (typical spin-locks have the problem of threads trying to out invalidate each other's cache!) for readers.

Now, the way I'm deploying this TLS variable is by exporting it from a "core" DSO that I know will never be dlopened - it will always be loaded before main.
Code from other DSOs - which may be dlopened at arbitrary points - need to use this TLS variable to perform "read-locking".
I would like to avoid the cost of __tls_get_addr, but at the same time I don't want to force client DSOs to build with -ftls-model=initial-exec (that could prevent them from being dlopened).
I would also like to avoid retrieving this TLS variable through a function call. It needs to be:

mov %fs:0, rbx
mov <offset patched by loader>, rcx
mov rbx[rcx], rax // rax holds the value of the TLS

Surely this is achievable?

Thanks,
Len

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Accessing TLS variable defined in DSO with static tls model from another DSO with dynamic tls model
  2021-05-28 18:02   ` Ho, Lennox
@ 2021-05-29  9:16     ` Alexander Monakov
  2021-05-29 16:27       ` Ho, Lennox
  0 siblings, 1 reply; 5+ messages in thread
From: Alexander Monakov @ 2021-05-29  9:16 UTC (permalink / raw)
  To: Ho, Lennox; +Cc: gcc-help

On Fri, 28 May 2021, Ho, Lennox via Gcc-help wrote:

> > You need to pass -ftls-model=initial-exec when compiling libtls_import.so, not
> > the exporting library. I am a bit surprised you've decided to do that the other
> > way around :)
> 
> Ahh ok so it does appear that I really misunderstood how static TLS works :/.
> 
> My assumption has been that since (static) TLS variables are placed in the
> static TLS segment, at a constant offset (to the left) from the thread pointer
> (as shown here
> https://th.bing.com/th/id/R92210a8cb7df88cb1b9586ce73d8da90?rik=FX6vqTs2uG%2brzw&pid=ImgRaw),
> that initial-exec/local-exec is an imperative on the TLS data itself, not on
> the act of *accessing* said data.

On one hand, yes, it's a run-time property of the TLS symbol. On the other hand,
the compiler selects the most efficient code to access a TLS variable based on
what it knows about its location. By passing -ftls-model=initial-exec you're
promising to the compiler that each TLS symbol will be in the static TLS block.

> While tagging the DSO that accesses the TLS data (libtls_export.so) with
> DF_STATIC_TLS will do the trick (libtls_export.so must be loaded before
> libtls_import.so - which must be loaded before main() - and so the TLS in
> libtls_export.so can be placed in a static TLS section), surely tagging the
> export DSO (libtls_export.so) with DF_STATIC_TLS instead would be more
> natural?

Well, it's not clear. For the exporting module there's no difference whether its
TLS definitions are supposed to be in the static block or not. All the
difference is on the importing modules' side: one module could have
general-dynamic references via tls_get_addr, and the other could have efficient
initial-exec references. Either module could be safely dlopen'ed provided that
the defining module was loaded at program startup.

> My understanding is that the dynamic loader will have the opportunity to fixup
> the offsets in any future DSO that is loaded.

But the dynamic loader does not edit the code (only the addresses in the GOT
table and other writable areas).

> Do you mind elaborating why DF_STATIC_TLS is placed on the client/import DSO
> and not the export DSO?

It's just for information. It somewhat matters when the DSO both defines and
references its own TLS data, but in your scenario it's moot. It's placed based
on the code performing the accesses (i.e. relocation kinds).

> Is there something that I'm still missing?
> 
> -------------------
> 
> To offer some additional context, I have an unfair reader-writer mutex (unfair
> in that readers are heavily favoured) implementation that uses TLS variables
> to indicate whether a thread is currently "read-locking".  There are a few
> other bit and pieces required to make this work, but the advantage of this
> approach is we completely eliminate cache contention (typical spin-locks have
> the problem of threads trying to out invalidate each other's cache!) for
> readers.
> 
> Now, the way I'm deploying this TLS variable is by exporting it from a "core"
> DSO that I know will never be dlopened - it will always be loaded before main.
> Code from other DSOs - which may be dlopened at arbitrary points - need to use
> this TLS variable to perform "read-locking".  I would like to avoid the cost
> of __tls_get_addr, but at the same time I don't want to force client DSOs to
> build with -ftls-model=initial-exec (that could prevent them from being
> dlopened).  I would also like to avoid retrieving this TLS variable through a
> function call. It needs to be:
> 
> mov %fs:0, rbx
> mov <offset patched by loader>, rcx
> mov rbx[rcx], rax // rax holds the value of the TLS
> 
> Surely this is achievable?

Just use the attribute on the specific variable as mentioned in my previous
email.

Alexander

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Accessing TLS variable defined in DSO with static tls model from another DSO with dynamic tls model
  2021-05-29  9:16     ` Alexander Monakov
@ 2021-05-29 16:27       ` Ho, Lennox
  0 siblings, 0 replies; 5+ messages in thread
From: Ho, Lennox @ 2021-05-29 16:27 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-help

> On one hand, yes, it's a run-time property of the TLS symbol. On the other hand, the compiler selects the most efficient code to access a TLS variable based on what it knows about its location. By passing -ftls-model=initial-exec you're promising to the compiler that each TLS symbol will be in the static TLS block.

That makes a lot of sense, but just bear with me a bit longer because I think we've hit the crux of what I'm trying to get at here.

Let's say we are using the attribute form of tls-model=initial-exec. That means applying __attribute__((tls_model( initial-exec ))) to our TLS variable.
Both export and import DSOs (libtls_export.so and libtls_import.so) will essentially be compiled with -ftls-model=initial-exec.

Theory crafting time:

Now, imagine (theoretically) that the DF_STATIC_TLS property were to ONLY be applied to DSOs that define/export at least one static TLS variable.
That means libtls_export.so would have DF_STATIC_TLS while libtls_import.so would NOT have DF_STATIC_TLS.

The compiler knows it can safely generate the efficient, offset-based retrieval routines for libtls_export.so (it can see the initial-exec attribute in the declaration of the TLS variable).
However, since libtls_export.so is just a consumer, it doesn't really need to be tagged with DF_STATIC_TLS.
The fact that libtls_import.so is tagged with DF_STATIC_TLS will ensure the TLS variable is placed in the static TLS segment (to the left of TP).
libtls_export.so can now be dlopened at arbitrary points.

I realize the attribute form of tls_model just ties into the command line form of the same setting, and the optimisation I've just described here cannot be safely achieved with the command line flag.
But I'm curious to know if what I said makes sense. *Could* this have worked (ignoring this totally breaks ABI)?

On a similar vein, is there any attribute/feature in gcc today that could have achieved the same effect?
I know this is a weird use case, but I just want to see how far this can be pushed.

Thanks again,
Len

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-29 16:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-27 23:19 Accessing TLS variable defined in DSO with static tls model from another DSO with dynamic tls model Ho, Lennox
2021-05-28 13:44 ` Alexander Monakov
2021-05-28 18:02   ` Ho, Lennox
2021-05-29  9:16     ` Alexander Monakov
2021-05-29 16:27       ` Ho, Lennox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).