From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4028 invoked by alias); 12 Apr 2010 18:51:10 -0000 Mailing-List: contact archer-help@sourceware.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Subscribe: List-Id: Received: (qmail 4007 invoked by uid 22791); 12 Apr 2010 18:51:07 -0000 X-SWARE-Spam-Status: No, hits=-5.4 required=5.0 tests=BAYES_05,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,TW_OV,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Subject: Re: Cross-CU C++ DIE references vs. mangling From: Sami Wagiaalla To: Roland McGrath Cc: Jan Kratochvil , archer@sourceware.org, Keith Seitz In-Reply-To: <20100311060305.B177A7D5E@magilla.sf.frob.com> References: <20100310191833.GA2816@host0.dyn.jankratochvil.net> <20100310193207.GA6147@host0.dyn.jankratochvil.net> <20100311060305.B177A7D5E@magilla.sf.frob.com> Content-Type: text/plain; charset="UTF-8" Date: Mon, 12 Apr 2010 18:51:00 -0000 Message-ID: <1271098008.2901.211.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2010-q2/txt/msg00008.txt.bz2 So after a few (really, many) reads of this email I think I can summarize the issues and solutions discussed there. I just wanted to make sure I have a proper understanding of the issue before filing a gcc feature request. So, Is this a correct summary: The goal is the help gdb find the proper location for variables where declarations and definitions are separated over CU's or so's. Why cant gdb do this by itself ? Because: - It requires a search of all other CU's/so' to locate the definition. Which is inefficient but also inaccurate since - The scope of the declaration can be different from that of the definition (e.g. class members). If DW_AT_MIPS_linkage_name is available it can be used to resolve this, however - if the definition is in a stripped DSO there is indeed a definition (ELF) but nowhere is there a DW_AT_location pointing to it. Also, - it is possible to have two names defined in two separate so's with the same linkage name. eg: > Consider: > > $ g++ -g -c -fPIC -o foo1.o -xc++ <(echo 'namespace internal __attribute__((visibility("hidden"))) { int i; };') > $ g++ -g -c -fPIC -o foo2.o -xc++ <(echo 'namespace internal __attribute__((visibility("hidden"))) { extern int i; }; int foo () { return internal::i; }') > $ gcc -g -shared -o foo.so foo1.o foo2.o > $ g++ -g -c -fPIC -o bar1.o -xc++ <(echo 'namespace internal { int i; };') > $ g++ -g -c -fPIC -o bar2.o -xc++ <(echo 'namespace internal { extern int i; }; int bar () { return internal::i; }') > $ gcc -g -shared -o bar.so bar1.o bar2.o > $ eu-readelf -sr -winfo foo.so bar.so > > Now imagine a program linking in both foo.so and bar.so. There are > two different things that are both separate but equal and both truly > internal::i and both truly _ZN8internal1iE. By any method, there is > no one answer to, "What is internal::i?" The only answers are > context-specific. > Proposed solution: Teach the compiler to generate a DW_AT_location for a non defining declaration that is applicable in that die's scope. That location expression would be parallel to the assembly generated for the symbol > The key is that you can have the same(ish) relocs using the same > symbols in the code and DWARF as assembled. Then whatever happens > in linking stages later should be the same[...] So, > For non-PIC code, the actual code looks like: > > movl _ZN8internal1iE(%rip), %eax > > and the DWARF bit could look like: > > .byte DW_OP_addr > .quad _ZN8internal1iE > [...] > These get resolved at link time to absolute addresses, et voila. And, > In a PIC access, what the final code will actually do is not really > related to anything about ELF symbols. It's just memory indirection. > The PIC code is: > > movq _ZN8internal1iE@GOTPCREL(%rip), %rax > movl (%rax), %eax > [...] > .byte DW_OP_addr > .quad _ZN8internal1iE@GOT > .byte DW_OP_deref > > This generates R_X86_64_GOT64. At link time, this too goes away and > becomes the "absolute" address of the .got slot. The following part I don't quite understand: > We could certainly teach GCC to do this. > It would then be telling us more pieces of direct truth about the code. > Would that not be the best thing ever? > Well, almost. > > First, what about a defining declaration in a PIC CU? > > In the abstract, a defining declaration can be considered as talking > about two different things. One is its declarationhood, wherein it > says that the containing scope has this name visible. For that > purpose, it could reasonably be expected to be like a non-defining > declaration: say how code in this scope accesses the variable--the > truth about what's in the assembly code for any accesses in that CU. > But the other thing is its definitionhood, wherein it says what data > address contains the data cell and thus (optionally) implies what > object file position holds the initializer image--another truth about > what's in the assembly code for the definition in this CU. > > In non-PIC code, these two truths match. Both use direct address > constants (as relocated at link time). But in PIC code, the truth > about the definition is an address constant, while the truth about the > access is an indirection through .got. (If you have PIC code that > uses __attribute__((visibility("hidden"))) then it's direct access, > though PC-relative, and thus "non-PIC" ("absolute") for DWARF > purposes, so both truths match as in truly non-PIC code.) > > Personally, I would be all for having it both ways. In a CU where a > defining declaration is actually used by PIC accesses, then you could > generate a second non-defining declaration (even for C). Give it > DW_AT_artificial, DW_AT_declaration, DW_AT_specification pointing to > the defining declaration (in lieu of DW_AT_name, DW_AT_type, et al), > and then DW_AT_location with the PIC style using indirection. > > With that, you could know that if you got a DW_AT_location from any > DIE with DW_AT_declaration then you're done and have the real truth > for accesses. If we presume no CUs from pre-apocalyptic compilers now > that we are in these here end times, then we are finally free from > ever having to rely on discerning the right ELF symbol from a name we > surmised from DWARF (be it via DW_AT_MIPS_linkage_name or mangling). > Why is there a need for second artificial location describing die ? As I understand it declarationhood is specified by the die's nesting in the die hierarchy not its DW_AT_location. In other words, what is missing in the current way gcc specifies locations for defining declarations ? This summary does not include the part starting with "Before dynamic linker startup" to the end of the email. Mainly because I am assuming that the main use case is after dynamic linker startup.