TLS, gcc optimizations, and PIC on x86

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* TLS, gcc optimizations, and PIC on x86
@ 2011-09-01  1:55 Kevin Klues
  2011-09-01  5:01 ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Klues @ 2011-09-01  1:55 UTC (permalink / raw)
  To: gcc-help

I have a question regarding cached values for the addresses of TLS
variables in PIC when gcc optimizations are turned on.

Specifically, I want to be able to change the value of my TLS
descriptor as well as access TLS variables from that new descriptor
within the body of a single function.  With gcc optimizations turned
on and when compiling for PIC, this doesn't always seem to work.  The
problem lies in the fact that, with optimizations turned on, the
addresses of any TLS variables accessed before the TLS descriptor is
changed, persist to accesses of these variables after the TLS has
changed.  Note, this is only really a problem with PIC, as addresses
are calculated via a function call to ___tls_get_addr() and gcc is
trying to optimize these calls away (yes, even with
tls-model=global-dynamic explicitly set).  For non PIC everything
appears to be fine, since a different method of calculating TLS
variable addresses is used and doesn't need to be optimized.

Consider the following pseudo code:

__thread int i = 0;
i = 5;  // Set variable in original tls region
set_tls_desc(new_tls);
i = 6;  // Set variable in new tls region

I want i=6 to be set for the i in my new tls region, not the original
one.  However, with optimizations turned on (i.e. -O2), the i from the
old tls region is usually the one whose value gets changed.  I mean,
this makes sense since there is no real way (that I am aware of) to
tell gcc that the TLS has changed out from under it -- all O2
optimizations for caching variables in registers, etc. are valid.

One thing that had occurred to me was to force a compiler memory
barrier after set_tls_desc() (i.e. asm volatile("" ::: "memory")), but
this doesn't work because there's not any memory changing for the
variables I have access to (i.e. i in this case) - but rather it's the
address of the variable i as calculated via a call to __tls_get_addr()
that has now changed....

Essentially, I want a way to inform the compiler that any TLS
variables accessed after my call to set_tls_desc() need to have their
addresses refreshed via a new call to __tls_get_addr().  I was hoping
I could accomplish this via something like asm volatile ("" ::: "%gs")
for i386 or asm volatile ("" ::: "%fs") on x86_64, but apparently
these registers aren't even allowed in the clobber list....

Therefore, my current solution to solve this problem is to define the
following macros:

#define safe_set_tls_var(name, val)
({
  void __attribute__((noinline, optimize("O0")))
  safe_set_tls_var_internal() {
    asm("");
    name = val;
  } safe_set_tls_var_internal();
})

#define safe_get_tls_var(name)
({
  typeof(name) __attribute__((noinline, optimize("O0")))
  safe_get_tls_var_internal() {
    return name;
  } safe_get_tls_var_internal();
})

These macros create a nested function at each call site where a TLS
variable is get/set with attributes of 'noinline' and
'optimize("O0")'.  Having these attributes makes sure that the
getting/setting operations actually load the tls addr via a call to
___tls_get_addr() instead of using a cached value.  Furthermore, by
making these macros that define nested functions instead of defining a
single pair of functions that take a set of parameters, I am able to
reference the specific TLS variable I want to set BY NAME, at the call
site, don't have to pass the address to it instead. This forces a
recalculation of the address of the specific TLS variable under
question within the body of the nested function.

The code then becomes:

__thread int i = 0;
i = 5;
set_tls_desc(new_tls);
safe_set_tls_var(i, 6);

This can get cumbersome though, if I keep accessing TLS variables over
and over again further down in the function.  I could always break up
the function into multiple pieces to force all TLS variables to be
accessed in a new function after the call to set_tls_desc(), but it
doesn't seem like something I should have to do.

As I said before, what I really want is something like: asm volatile
("" ::: "%gs") (or better yet, asm volatile ("" ::: "tls")) do hint to
the compiler that all the addresses of all TLS variables after this
point need to be recalculated.

For all I know though, maybe something like this already exists.
Does anyone know of something like this that's already out there?

-- 
~Kevin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TLS, gcc optimizations, and PIC on x86
  2011-09-01  1:55 TLS, gcc optimizations, and PIC on x86 Kevin Klues
@ 2011-09-01  5:01 ` Ian Lance Taylor
  2011-09-01  7:15   ` Kevin Klues
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2011-09-01  5:01 UTC (permalink / raw)
  To: Kevin Klues; +Cc: gcc-help

Kevin Klues <klueska@cs.berkeley.edu> writes:

> Specifically, I want to be able to change the value of my TLS
> descriptor as well as access TLS variables from that new descriptor
> within the body of a single function.

As you've discovered, gcc does not support that.  It seems like a highly
unusual feature to want.  It would not be hard to change gcc to add an
option to support this, but I'm not sure it's worth the ongoing
maintenance cost.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TLS, gcc optimizations, and PIC on x86
  2011-09-01  5:01 ` Ian Lance Taylor
@ 2011-09-01  7:15   ` Kevin Klues
  2011-09-06  7:10     ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Klues @ 2011-09-01  7:15 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Fair enough.  My current solution is adequate for our purposes, as
(like you said) this isn't a very common operation to perform....

That said, do you see any obvious issues with my solution?  It seems
to work for all of the test cases I've thrown at it, but I could be
missing something.  Additionally, do you have any suggestions for a
better method that achieves similar results?  Ideally I'd like a
solution that didn't require the use of the 'optimize' attribute as
(unfortunately) some of the systems on which we'd like to compile our
code still use gcc < 4.4.

Kevin

On Wed, Aug 31, 2011 at 10:00 PM, Ian Lance Taylor <iant@google.com> wrote:
> Kevin Klues <klueska@cs.berkeley.edu> writes:
>
>> Specifically, I want to be able to change the value of my TLS
>> descriptor as well as access TLS variables from that new descriptor
>> within the body of a single function.
>
> As you've discovered, gcc does not support that.  It seems like a highly
> unusual feature to want.  It would not be hard to change gcc to add an
> option to support this, but I'm not sure it's worth the ongoing
> maintenance cost.
>
> Ian
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TLS, gcc optimizations, and PIC on x86
  2011-09-01  7:15   ` Kevin Klues
@ 2011-09-06  7:10     ` Ian Lance Taylor
  2011-09-06 20:31       ` Kevin Klues
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2011-09-06  7:10 UTC (permalink / raw)
  To: Kevin Klues; +Cc: gcc-help

Kevin Klues <klueska@cs.berkeley.edu> writes:

> That said, do you see any obvious issues with my solution?  It seems
> to work for all of the test cases I've thrown at it, but I could be
> missing something.  Additionally, do you have any suggestions for a
> better method that achieves similar results?  Ideally I'd like a
> solution that didn't require the use of the 'optimize' attribute as
> (unfortunately) some of the systems on which we'd like to compile our
> code still use gcc < 4.4.

I don't see any obvious issues with your solution.  I'm not sure why you
need to use the optimize attribute; I would have expected that the
noinline attribute would be sufficient here.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TLS, gcc optimizations, and PIC on x86
  2011-09-06  7:10     ` Ian Lance Taylor
@ 2011-09-06 20:31       ` Kevin Klues
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin Klues @ 2011-09-06 20:31 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

It looks like 'optimize' is necessary in cases where I do something like:

__thread int i = 0;
i = 5;  // Set variable in original tls region
set_tls_desc(new_tls);
i = 5;  // Set variable in new tls region

Where I set i equal to the same value, but in 2 different TLS regions.
 Glancing through the assembly, it appears that the code for the
nested function indeed get's generated (as per the 'noinline'), but
then it simply noops and returns.

Kevin

On Tue, Sep 6, 2011 at 12:09 AM, Ian Lance Taylor <iant@google.com> wrote:
> Kevin Klues <klueska@cs.berkeley.edu> writes:
>
>> That said, do you see any obvious issues with my solution?  It seems
>> to work for all of the test cases I've thrown at it, but I could be
>> missing something.  Additionally, do you have any suggestions for a
>> better method that achieves similar results?  Ideally I'd like a
>> solution that didn't require the use of the 'optimize' attribute as
>> (unfortunately) some of the systems on which we'd like to compile our
>> code still use gcc < 4.4.
>
> I don't see any obvious issues with your solution.  I'm not sure why you
> need to use the optimize attribute; I would have expected that the
> noinline attribute would be sufficient here.
>
> Ian
>



-- 
~K€vin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-09-06 20:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01  1:55 TLS, gcc optimizations, and PIC on x86 Kevin Klues
2011-09-01  5:01 ` Ian Lance Taylor
2011-09-01  7:15   ` Kevin Klues
2011-09-06  7:10     ` Ian Lance Taylor
2011-09-06 20:31       ` Kevin Klues

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).