From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4869 invoked by alias); 2 Oct 2007 02:20:39 -0000 Received: (qmail 4859 invoked by uid 22791); 2 Oct 2007 02:20:37 -0000 X-Spam-Status: No, hits=0.3 required=5.0 tests=AWL,BAYES_20,DK_POLICY_SIGNSOME,FORGED_RCVD_HELO,SPF_FAIL,TW_TM X-Spam-Check-By: sourceware.org Received: from sccrmhc12.comcast.net (HELO sccrmhc12.comcast.net) (204.127.200.82) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 02 Oct 2007 02:20:34 +0000 Received: from gateway.sf.frob.com (c-67-160-211-197.hsd1.ca.comcast.net[67.160.211.197]) by comcast.net (sccrmhc12) with ESMTP id <200710020220310120047490e>; Tue, 2 Oct 2007 02:20:32 +0000 Received: from magilla.localdomain (magilla.sf.frob.com [198.49.250.228]) by gateway.sf.frob.com (Postfix) with ESMTP id 76C06357B; Mon, 1 Oct 2007 19:20:30 -0700 (PDT) Received: by magilla.localdomain (Postfix, from userid 5281) id 46E5E4D0544; Mon, 1 Oct 2007 19:20:30 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roland McGrath To: Sami Wagiaalla Cc: frysk Subject: Re: Dwarf/libdw question In-Reply-To: Sami Wagiaalla's message of Monday, 1 October 2007 11:36:18 -0400 <470113F2.5000105@redhat.com> References: <470113F2.5000105@redhat.com> Emacs: the Swiss Army of Editors. Message-Id: <20071002022030.46E5E4D0544@magilla.localdomain> Date: Tue, 02 Oct 2007 02:20:00 -0000 X-IsSubscribed: yes Mailing-List: contact frysk-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: frysk-owner@sourceware.org X-SW-Source: 2007-q4/txt/msg00008.txt.bz2 Hi Sami. Please use more specific Subject lines in your postings. Reading the list archives' index will not be very informative to someone looking years from now for discussion on this particular topic. > I am working on implementing c++ scoping rules in frysk. Is there > elfutils API that I can use to figure out what class/struct a function > belongs to, so that references to member variables can be resolved. The key is DW_AT_specification. Let's take an example: class c { int m1() { return 17; } int m2(); public: int m() { return m1() + m2(); } }; int c::m2() { return 23; } int main() { c x; return x.m(); } The DIE tree for this is (explanations below): [ b] compile_unit macro_info 0 stmt_list 0 producer "GNU C++ 4.1.2 20070502 (Red Hat 4.1.2-12)" language C++ (4) name "s.cxx" comp_dir "/home/roland/build/stock-elfutils" [ 67] structure_type sibling [ d4] name "c" byte_size 1 decl_file 1 decl_line 2 [ 71] subprogram sibling [ 94] external name "m1" decl_file 1 decl_line 3 MIPS_linkage_name "_ZN1c2m1Ev" type [ d4] accessibility private (3) declaration [ 8d] formal_parameter type [ db] artificial [ 94] subprogram sibling [ b7] external name "m2" decl_file 1 decl_line 4 MIPS_linkage_name "_ZN1c2m2Ev" type [ d4] accessibility private (3) declaration [ b0] formal_parameter type [ db] artificial [ b7] subprogram external name "m" decl_file 1 decl_line 6 MIPS_linkage_name "_ZN1c1mEv" type [ d4] declaration [ cc] formal_parameter type [ db] artificial [ d4] base_type name "int" byte_size 4 encoding signed (5) [ db] pointer_type byte_size 8 type [ 67] [ e1] subprogram sibling [ 10d] specification [ 71] low_pc 0x000000000040054c high_pc 0x000000000040055b frame_base location list [ 0] [ fe] formal_parameter name "this" type [ 10d] artificial location 2 byte block [ 0] fbreg -24 [ 10d] const_type type [ db] [ 112] subprogram sibling [ 13f] specification [ 94] decl_line 9 low_pc 0x0000000000400528 high_pc 0x0000000000400537 frame_base location list [ 4c] [ 130] formal_parameter name "this" type [ 10d] artificial location 2 byte block [ 0] fbreg -24 [ 13f] subprogram sibling [ 16b] specification [ b7] low_pc 0x000000000040055c high_pc 0x0000000000400587 frame_base location list [ 98] [ 15c] formal_parameter name "this" type [ 10d] artificial location 2 byte block [ 0] fbreg -32 [ 16b] subprogram external name "main" decl_file 1 decl_line 11 type [ d4] low_pc 0x0000000000400538 high_pc 0x000000000040054b frame_base location list [ e4] [ 18c] variable name "x" decl_file 1 decl_line 13 type [ 67] location 2 byte block [ 0] fbreg -17 Note that the subprogram DIEs describing actual machine code are top-level children of the CU. Here these are [e1], [112], [13f]. They are not children of [67], the structure_type DIE describing the class. This is sensible enough because these are global function definitions, even if they have names and types with scope limited to the class. Consider [112]. This has the attributes and children that refer to its machine code (low_pc, high_pc, frame_base, formal_parameter). Note it does not have the attributes like name and type. Instead, it has a specification attribute that points to [94]. specification is analogous to abstract_origin, but rather than linking a concrete code element to an abstract inline definition, it links a concrete code element to an abstract declaration. So, [112] is the code for "m2", and [94] is the specification for "m2". dwarf_attr_integrate checks for specification as well as abstract_origin. So, for common cases with attributes you just don't think about it. dwarf_diename uses dwarf_attr_integrate, so you will see a name without extra effort even if it's indirect. I used [112] as the example because m2 is defined outside the class definition. As you can see, GCC does the same thing for m1 [e1] and m [13f], though those definitions actually appear lexically inside the class. Reading the DWARF spec one would expect these cases to use a single DIE inside the class and not use DW_AT_specification at all. I don't know if there is a particular reason GCC doesn't do that, and I see no big benefit in changing what it does. But I think that DWARF consumers should expect that either style might be used and work the same with either. Note how [112] has a decl_line attribute but no decl_file, while [e1] and [13f] have neither. This is an example of the general rule with specification (and abstract_origin): it's elided if it's not different. Since m2's body was defined outside the class, [112] refers to line 9. If the class declaration were in a header file and the method definition in another file, there would also be a decl_file attribute. (If everything were all on one line and the compiler emitted column information, there would be a decl_column but no decl_line. The compiler does not yet emit decl_column attributes, but we should write consumers as if it did.) Since [e1] and [13f] describe bodies defined in their selfsame specification declarations, they would never have a decl_{file,line,column} of their own. So now I've told you the basics to work with, but not actually answered your question. There are two parts to resolving class members. First, the name resolution per se. First there are scopes inside a subprogram DIE, same as in C. When you are dealing with a class method, the subprogram's specification attribute gives you the declaration inside the class scope (use dwarf_formref_die (dwarf_attr (...))). Then use dwarf_getscopes_die on that to see the class, namespace, etc. scopes containing it. For each of those, see if they have DW_TAG_inheritance, DW_TAG_imported_declaration, etc. children that contribute more scopes to the name resolution logic for the language. Among those you find a member, variable, subprogram, etc. DIE by the name you are looking for. If you found a static member (aka class variable), i.e. DW_TAG_variable, you are done. It gets treated just like other variable DIEs. If you found a class member (aka instance variable), i.e. DW_TAG_member, then it depends on how you plan to use it. For the context of a pointer to member (as "mem" in "type cl::*p = &cl::mem;"), then you are done. The DW_AT_data_member_location tells you what value to use. In a static method (aka class method), referring to a regular class member (instance variable) is invalid. In an instance method, "mem" is resolved the same as "this->mem". The subprogram DIE for the method definition contains an automatically-inserted first formal_parameter DIE, with the artifical attribute and named "this". AFAICT, the only way to distinguish a static method from an instance method in the DWARF tree is the presence of this first artifical formal_parameter. (Though in practice it always has the name attribute of "this", I would write it to detect a first formal_parameter with artifical rather than looking at the name.) This formal_parameter is like any other aside from being artifical, so you combine its location attribute with the PC context you're looking from, and data_member_location attribute of the member DIE to find the member in the object from that PC context. When the name resolved to a subprogram DIE, you have to do two things to see how to treat it. First, if the DIE has DW_AT_declaration, then you have to find the concrete code DIE whose DW_AT_specification points to it. Then, you have to check (as above) whether it's a static method or an instance method, so you know what "name(foo)" is supposed to mean if a user gave that as a call. Thanks, Roland