From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-1952-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 4028 invoked by alias); 12 Apr 2010 18:51:10 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 4007 invoked by uid 22791); 12 Apr 2010 18:51:07 -0000
X-SWARE-Spam-Status: No, hits=-5.4 required=5.0
	tests=BAYES_05,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,TW_OV,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Subject: Re: Cross-CU C++ DIE references vs. mangling
From: Sami Wagiaalla <swagiaal@redhat.com>
To: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>, archer@sourceware.org,
        Keith Seitz <keiths@redhat.com>
In-Reply-To: <20100311060305.B177A7D5E@magilla.sf.frob.com>
References: <20100310191833.GA2816@host0.dyn.jankratochvil.net>
	 <20100310193207.GA6147@host0.dyn.jankratochvil.net>
	 <20100311060305.B177A7D5E@magilla.sf.frob.com>
Content-Type: text/plain; charset="UTF-8"
Date: Mon, 12 Apr 2010 18:51:00 -0000
Message-ID: <1271098008.2901.211.camel@localhost>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-SW-Source: 2010-q2/txt/msg00008.txt.bz2

So after a few (really, many) reads of this email I think I can
summarize the issues and solutions discussed there. I just wanted to
make sure I have a proper understanding of the issue before filing a gcc
feature request. So, Is this a correct summary:

The goal is the help gdb find the proper location for variables where
declarations and definitions are separated over CU's or so's.

Why cant gdb do this by itself ? Because:

- It requires a search of all other CU's/so' to locate the definition.
  Which is inefficient but also inaccurate since

- The scope of the declaration can be different from that of the
  definition (e.g. class members). If DW_AT_MIPS_linkage_name is
  available it can be used to resolve this, however

- if the definition is in a stripped DSO there is indeed a definition
  (ELF) but nowhere is there a DW_AT_location pointing to it. Also,

- it is possible to have two names defined in two separate so's with the
  same linkage name. eg:

> Consider:
> 
> 	$ g++ -g -c -fPIC -o foo1.o -xc++ <(echo 'namespace internal __attribute__((visibility("hidden"))) { int i; };')
> 	$ g++ -g -c -fPIC -o foo2.o -xc++ <(echo 'namespace internal __attribute__((visibility("hidden"))) { extern int i; }; int foo () { return internal::i; }')
> 	$ gcc -g -shared -o foo.so foo1.o foo2.o
> 	$ g++ -g -c -fPIC -o bar1.o -xc++ <(echo 'namespace internal { int i; };')
> 	$ g++ -g -c -fPIC -o bar2.o -xc++ <(echo 'namespace internal { extern int i; }; int bar () { return internal::i; }')
> 	$ gcc -g -shared -o bar.so bar1.o bar2.o
> 	$ eu-readelf -sr -winfo foo.so bar.so
> 
> Now imagine a program linking in both foo.so and bar.so.  There are
> two different things that are both separate but equal and both truly
> internal::i and both truly _ZN8internal1iE.  By any method, there is
> no one answer to, "What is internal::i?"  The only answers are
> context-specific.
> 

Proposed solution:

Teach the compiler to generate a DW_AT_location for a non defining
declaration that is applicable in that die's scope. That location
expression would be parallel to the assembly generated for the symbol

> The key is that you can have the same(ish) relocs using the same
> symbols in the code and DWARF as assembled.  Then whatever happens
> in linking stages later should be the same[...]

So,

> For non-PIC code, the actual code looks like:
> 
> 	movl	_ZN8internal1iE(%rip), %eax
> 
> and the DWARF bit could look like:
> 
> 	.byte DW_OP_addr
> 	.quad _ZN8internal1iE
> 
[...]
> These get resolved at link time to absolute addresses, et voila.

And,

> In a PIC access, what the final code will actually do is not really
> related to anything about ELF symbols.  It's just memory indirection.
> The PIC code is:
> 
> 	movq	_ZN8internal1iE@GOTPCREL(%rip), %rax
> 	movl	(%rax), %eax
> 
[...]
> 	.byte DW_OP_addr
> 	.quad _ZN8internal1iE@GOT
> 	.byte DW_OP_deref
> 
> This generates R_X86_64_GOT64.  At link time, this too goes away and
> becomes the "absolute" address of the .got slot.  

The following part I don't quite understand:

> We could certainly teach GCC to do this.
> It would then be telling us more pieces of direct truth about the code.
> Would that not be the best thing ever?
> Well, almost.
> 
> First, what about a defining declaration in a PIC CU?  
> 
> In the abstract, a defining declaration can be considered as talking
> about two different things.  One is its declarationhood, wherein it
> says that the containing scope has this name visible.  For that
> purpose, it could reasonably be expected to be like a non-defining
> declaration: say how code in this scope accesses the variable--the
> truth about what's in the assembly code for any accesses in that CU.
> But the other thing is its definitionhood, wherein it says what data
> address contains the data cell and thus (optionally) implies what
> object file position holds the initializer image--another truth about
> what's in the assembly code for the definition in this CU.
> 
> In non-PIC code, these two truths match.  Both use direct address
> constants (as relocated at link time).  But in PIC code, the truth
> about the definition is an address constant, while the truth about the
> access is an indirection through .got.  (If you have PIC code that
> uses __attribute__((visibility("hidden"))) then it's direct access,
> though PC-relative, and thus "non-PIC" ("absolute") for DWARF
> purposes, so both truths match as in truly non-PIC code.)
> 
> Personally, I would be all for having it both ways.  In a CU where a
> defining declaration is actually used by PIC accesses, then you could
> generate a second non-defining declaration (even for C).  Give it
> DW_AT_artificial, DW_AT_declaration, DW_AT_specification pointing to
> the defining declaration (in lieu of DW_AT_name, DW_AT_type, et al),
> and then DW_AT_location with the PIC style using indirection.
> 
> With that, you could know that if you got a DW_AT_location from any
> DIE with DW_AT_declaration then you're done and have the real truth
> for accesses.  If we presume no CUs from pre-apocalyptic compilers now
> that we are in these here end times, then we are finally free from
> ever having to rely on discerning the right ELF symbol from a name we
> surmised from DWARF (be it via DW_AT_MIPS_linkage_name or mangling).
> 

Why is there a need for second artificial location describing die ? As I
understand it declarationhood is specified by the die's nesting in the
die hierarchy not its DW_AT_location. In other words, what is missing in
the current way gcc specifies locations for defining declarations ?

This summary does not include the part starting with "Before dynamic
linker startup" to the end of the email. Mainly because I am assuming
that the main use case is after dynamic linker startup.