From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-help-return-47460-listarch-gcc-help=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 29087 invoked by alias); 1 Sep 2011 01:55:16 -0000
Received: (qmail 29078 invoked by uid 22791); 1 Sep 2011 01:55:14 -0000
X-SWARE-Spam-Status: No, hits=-2.6 required=5.0	tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,T_TO_NO_BRKTS_FREEMAIL
X-Spam-Check-By: sourceware.org
Received: from mail-gw0-f47.google.com (HELO mail-gw0-f47.google.com) (74.125.83.47)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 01 Sep 2011 01:54:59 +0000
Received: by gwb11 with SMTP id 11so628570gwb.20        for <gcc-help@gcc.gnu.org>; Wed, 31 Aug 2011 18:54:58 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.236.136.65 with SMTP id v41mr6070984yhi.29.1314842098636; Wed, 31 Aug 2011 18:54:58 -0700 (PDT)
Received: by 10.236.102.137 with HTTP; Wed, 31 Aug 2011 18:54:58 -0700 (PDT)
Date: Thu, 01 Sep 2011 01:55:00 -0000
Message-ID: <CAJR1fVoBxoqfBCAi8wY2qWWrCrjq-M4t56b2+ViraLRu1Kzy7g@mail.gmail.com>
Subject: TLS, gcc optimizations, and PIC on x86
From: Kevin Klues <klueska@cs.berkeley.edu>
To: gcc-help@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
X-IsSubscribed: yes
Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-help.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-help@gcc.gnu.org>
Sender: gcc-help-owner@gcc.gnu.org
X-SW-Source: 2011-09/txt/msg00003.txt.bz2

I have a question regarding cached values for the addresses of TLS
variables in PIC when gcc optimizations are turned on.

Specifically, I want to be able to change the value of my TLS
descriptor as well as access TLS variables from that new descriptor
within the body of a single function.  With gcc optimizations turned
on and when compiling for PIC, this doesn't always seem to work.  The
problem lies in the fact that, with optimizations turned on, the
addresses of any TLS variables accessed before the TLS descriptor is
changed, persist to accesses of these variables after the TLS has
changed.  Note, this is only really a problem with PIC, as addresses
are calculated via a function call to ___tls_get_addr() and gcc is
trying to optimize these calls away (yes, even with
tls-model=global-dynamic explicitly set).  For non PIC everything
appears to be fine, since a different method of calculating TLS
variable addresses is used and doesn't need to be optimized.

Consider the following pseudo code:

__thread int i = 0;
i = 5;  // Set variable in original tls region
set_tls_desc(new_tls);
i = 6;  // Set variable in new tls region

I want i=6 to be set for the i in my new tls region, not the original
one.  However, with optimizations turned on (i.e. -O2), the i from the
old tls region is usually the one whose value gets changed.  I mean,
this makes sense since there is no real way (that I am aware of) to
tell gcc that the TLS has changed out from under it -- all O2
optimizations for caching variables in registers, etc. are valid.

One thing that had occurred to me was to force a compiler memory
barrier after set_tls_desc() (i.e. asm volatile("" ::: "memory")), but
this doesn't work because there's not any memory changing for the
variables I have access to (i.e. i in this case) - but rather it's the
address of the variable i as calculated via a call to __tls_get_addr()
that has now changed....

Essentially, I want a way to inform the compiler that any TLS
variables accessed after my call to set_tls_desc() need to have their
addresses refreshed via a new call to __tls_get_addr().  I was hoping
I could accomplish this via something like asm volatile ("" ::: "%gs")
for i386 or asm volatile ("" ::: "%fs") on x86_64, but apparently
these registers aren't even allowed in the clobber list....

Therefore, my current solution to solve this problem is to define the
following macros:

#define safe_set_tls_var(name, val)
({
  void __attribute__((noinline, optimize("O0")))
  safe_set_tls_var_internal() {
    asm("");
    name = val;
  } safe_set_tls_var_internal();
})

#define safe_get_tls_var(name)
({
  typeof(name) __attribute__((noinline, optimize("O0")))
  safe_get_tls_var_internal() {
    return name;
  } safe_get_tls_var_internal();
})

These macros create a nested function at each call site where a TLS
variable is get/set with attributes of 'noinline' and
'optimize("O0")'.  Having these attributes makes sure that the
getting/setting operations actually load the tls addr via a call to
___tls_get_addr() instead of using a cached value.  Furthermore, by
making these macros that define nested functions instead of defining a
single pair of functions that take a set of parameters, I am able to
reference the specific TLS variable I want to set BY NAME, at the call
site, don't have to pass the address to it instead. This forces a
recalculation of the address of the specific TLS variable under
question within the body of the nested function.

The code then becomes:

__thread int i = 0;
i = 5;
set_tls_desc(new_tls);
safe_set_tls_var(i, 6);

This can get cumbersome though, if I keep accessing TLS variables over
and over again further down in the function.  I could always break up
the function into multiple pieces to force all TLS variables to be
accessed in a new function after the call to set_tls_desc(), but it
doesn't seem like something I should have to do.

As I said before, what I really want is something like: asm volatile
("" ::: "%gs") (or better yet, asm volatile ("" ::: "tls")) do hint to
the compiler that all the addresses of all TLS variables after this
point need to be recalculated.

For all I know though, maybe something like this already exists.
Does anyone know of something like this that's already out there?

-- 
~Kevin