From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29087 invoked by alias); 1 Sep 2011 01:55:16 -0000 Received: (qmail 29078 invoked by uid 22791); 1 Sep 2011 01:55:14 -0000 X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org Received: from mail-gw0-f47.google.com (HELO mail-gw0-f47.google.com) (74.125.83.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 01 Sep 2011 01:54:59 +0000 Received: by gwb11 with SMTP id 11so628570gwb.20 for ; Wed, 31 Aug 2011 18:54:58 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.136.65 with SMTP id v41mr6070984yhi.29.1314842098636; Wed, 31 Aug 2011 18:54:58 -0700 (PDT) Received: by 10.236.102.137 with HTTP; Wed, 31 Aug 2011 18:54:58 -0700 (PDT) Date: Thu, 01 Sep 2011 01:55:00 -0000 Message-ID: Subject: TLS, gcc optimizations, and PIC on x86 From: Kevin Klues To: gcc-help@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 X-IsSubscribed: yes Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2011-09/txt/msg00003.txt.bz2 I have a question regarding cached values for the addresses of TLS variables in PIC when gcc optimizations are turned on. Specifically, I want to be able to change the value of my TLS descriptor as well as access TLS variables from that new descriptor within the body of a single function. With gcc optimizations turned on and when compiling for PIC, this doesn't always seem to work. The problem lies in the fact that, with optimizations turned on, the addresses of any TLS variables accessed before the TLS descriptor is changed, persist to accesses of these variables after the TLS has changed. Note, this is only really a problem with PIC, as addresses are calculated via a function call to ___tls_get_addr() and gcc is trying to optimize these calls away (yes, even with tls-model=global-dynamic explicitly set). For non PIC everything appears to be fine, since a different method of calculating TLS variable addresses is used and doesn't need to be optimized. Consider the following pseudo code: __thread int i = 0; i = 5; // Set variable in original tls region set_tls_desc(new_tls); i = 6; // Set variable in new tls region I want i=6 to be set for the i in my new tls region, not the original one. However, with optimizations turned on (i.e. -O2), the i from the old tls region is usually the one whose value gets changed. I mean, this makes sense since there is no real way (that I am aware of) to tell gcc that the TLS has changed out from under it -- all O2 optimizations for caching variables in registers, etc. are valid. One thing that had occurred to me was to force a compiler memory barrier after set_tls_desc() (i.e. asm volatile("" ::: "memory")), but this doesn't work because there's not any memory changing for the variables I have access to (i.e. i in this case) - but rather it's the address of the variable i as calculated via a call to __tls_get_addr() that has now changed.... Essentially, I want a way to inform the compiler that any TLS variables accessed after my call to set_tls_desc() need to have their addresses refreshed via a new call to __tls_get_addr(). I was hoping I could accomplish this via something like asm volatile ("" ::: "%gs") for i386 or asm volatile ("" ::: "%fs") on x86_64, but apparently these registers aren't even allowed in the clobber list.... Therefore, my current solution to solve this problem is to define the following macros: #define safe_set_tls_var(name, val) ({ void __attribute__((noinline, optimize("O0"))) safe_set_tls_var_internal() { asm(""); name = val; } safe_set_tls_var_internal(); }) #define safe_get_tls_var(name) ({ typeof(name) __attribute__((noinline, optimize("O0"))) safe_get_tls_var_internal() { return name; } safe_get_tls_var_internal(); }) These macros create a nested function at each call site where a TLS variable is get/set with attributes of 'noinline' and 'optimize("O0")'. Having these attributes makes sure that the getting/setting operations actually load the tls addr via a call to ___tls_get_addr() instead of using a cached value. Furthermore, by making these macros that define nested functions instead of defining a single pair of functions that take a set of parameters, I am able to reference the specific TLS variable I want to set BY NAME, at the call site, don't have to pass the address to it instead. This forces a recalculation of the address of the specific TLS variable under question within the body of the nested function. The code then becomes: __thread int i = 0; i = 5; set_tls_desc(new_tls); safe_set_tls_var(i, 6); This can get cumbersome though, if I keep accessing TLS variables over and over again further down in the function. I could always break up the function into multiple pieces to force all TLS variables to be accessed in a new function after the call to set_tls_desc(), but it doesn't seem like something I should have to do. As I said before, what I really want is something like: asm volatile ("" ::: "%gs") (or better yet, asm volatile ("" ::: "tls")) do hint to the compiler that all the addresses of all TLS variables after this point need to be recalculated. For all I know though, maybe something like this already exists. Does anyone know of something like this that's already out there? -- ~Kevin