public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "loganh at synopsys dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/109164] New: aarch64 thread_local initialization error with -ftree-pre and -foptimize-sibling-calls
Date: Thu, 16 Mar 2023 23:09:34 +0000	[thread overview]
Message-ID: <bug-109164-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109164

            Bug ID: 109164
           Summary: aarch64 thread_local initialization error with
                    -ftree-pre and -foptimize-sibling-calls
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: loganh at synopsys dot com
  Target Milestone: ---

Created attachment 54687
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54687&action=edit
Bash script that reproduces the issue

With -ftree-pre, -foptimize-sibling-calls, and -O1 enabled, on
aarch64-linux-gnu, GCC 12.1.0 can generate code to access parts of thread_local
variables before the corresponding TLS init function is called if the variable
is accessed from a different TU than the variable is defined in. This
reordering could likely cause a number of different issues, but the one that
I've run into is that:
- When the compiler generates code to call a virtual function on a reference to
a to a global thread_local instance of an object defined in a different
translation unit, and
- The function calls itself in at least once branch,
the address of the  object is fetched from TLS before it's initialized, and
when the vtable lookup is attempted on that object to call the virtual function
the program segfaults.

Here's an example of the kind of code that will trip it up:

struct Struct  {                                                                
  virtual void virtual_func();                                                  
};                                                                              

extern thread_local Struct& thread_local_ref;                                   

bool other_func(void);                                                          

bool test_func(void) {                                                          
  thread_local_ref.virtual_func();                                              
  return other_func() && test_func();                                           
}

When this is compiled (on aarch64-linux-gnu, with -O1 and -ftree-pre and
-foptimize-sibling-calls) to an object file and then dumped with objdump -C -d,
this is the code produced:

0000000000000000 <test_func()>:                                                 
    0: a9be7bfd  stp x29, x30, [sp, #-32]!                                      
    4: 910003fd  mov x29, sp                                                    
    8: a90153f3  stp x19, x20, [sp, #16]                                        
    c: 90000000  adrp  x0, 0 <thread_local_ref>                                 
   10: f9400000  ldr x0, [x0]                                                   
   14: d53bd041  mrs x1, tpidr_el0                                              
   18: f8606834  ldr x20, [x1, x0]                                              
   1c: 90000013  adrp  x19, 0 <TLS init function for thread_local_ref>          
   20: f9400273  ldr x19, [x19]                                                 
   24: b4000053  cbz x19, 2c <test_func()+0x2c>                                 
   28: 94000000  bl  0 <TLS init function for thread_local_ref>                 
   2c: f9400280  ldr x0, [x20]                                                  
   30: f9400001  ldr x1, [x0]                                                   
   34: aa1403e0  mov x0, x20                                                    
   38: d63f0020  blr x1                                                         
   3c: 94000000  bl  0 <other_func()>                                           
   40: 12001c00  and w0, w0, #0xff                                              
   44: 35ffff00  cbnz  w0, 24 <test_func()+0x24>                                
   48: a94153f3  ldp x19, x20, [sp, #16]                                        
   4c: a8c27bfd  ldp x29, x30, [sp], #32                                        
   50: d65f03c0  ret  

Looking at addresses 0x14 through 0x18, you can see that the address of
'thread_local_ref' is read from the TLS block for the thread; the first time
this function is called, this will result in register x20 containing zero,
since the TLS block isn't intialized until the function call at 0x28. Directly
after that, at location 0x2c, a read is attempted from the address in register
x20 (zero) causing a segfault. Without -ftree-pre and -foptimize-sibling calls,
and without letting `test_func` call itself on at least one path, the code to
get the address of `thread_local_ref` is generated after the TLS init call, so
the problem does not occur.

I've attached a script that will reproduce what I've shown here, as well as
demonstrate the issue in action with a full executable that will produce the
segfault I've described.

             reply	other threads:[~2023-03-16 23:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-16 23:09 loganh at synopsys dot com [this message]
2023-03-16 23:25 ` [Bug tree-optimization/109164] thread_local initialization error with -ftree-pre pinskia at gcc dot gnu.org
2023-03-16 23:29 ` pinskia at gcc dot gnu.org
2023-03-16 23:31 ` pinskia at gcc dot gnu.org
2023-03-17 10:12 ` [Bug c++/109164] wrong code with thread_local reference, loops and -ftree-pre rguenth at gcc dot gnu.org
2023-03-17 12:01 ` jakub at gcc dot gnu.org
2023-03-20 19:32 ` cvs-commit at gcc dot gnu.org
2023-03-20 19:36 ` jakub at gcc dot gnu.org
2023-04-18  7:15 ` cvs-commit at gcc dot gnu.org
2023-05-02 20:16 ` cvs-commit at gcc dot gnu.org
2023-05-03 15:22 ` cvs-commit at gcc dot gnu.org
2023-05-04  7:26 ` jakub at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109164-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).