From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1993 invoked by alias); 26 Jul 2007 02:27:45 -0000 Received: (qmail 1981 invoked by uid 22791); 26 Jul 2007 02:27:44 -0000 X-Spam-Status: No, hits=0.6 required=5.0 tests=AWL,BAYES_20,DK_POLICY_SIGNSOME,FORGED_RCVD_HELO,SPF_FAIL,TW_BJ X-Spam-Check-By: sourceware.org Received: from alnrmhc15.comcast.net (HELO alnrmhc15.comcast.net) (204.127.225.95) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 26 Jul 2007 02:27:41 +0000 Received: from gateway.sf.frob.com (c-67-160-211-197.hsd1.ca.comcast.net[67.160.211.197]) by comcast.net (alnrmhc15) with ESMTP id <20070726022739b1500j5m9me>; Thu, 26 Jul 2007 02:27:39 +0000 Received: from magilla.localdomain (magilla.sf.frob.com [198.49.250.228]) by gateway.sf.frob.com (Postfix) with ESMTP id 7F655357B; Wed, 25 Jul 2007 19:27:38 -0700 (PDT) Received: by magilla.localdomain (Postfix, from userid 5281) id 4350E4D058D; Wed, 25 Jul 2007 19:27:38 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roland McGrath To: Andrew Cagney Cc: frysk@sourceware.org Subject: Re: which elf symbol? In-Reply-To: Andrew Cagney's message of Monday, 23 July 2007 12:58:01 -0400 <46A4DE19.4000200@redhat.com> X-Shopping-List: (1) Polygynous exclusion vapor (2) Alcoholic inelastic breezes (3) Ruinous suicides (4) Rudimentary expulsion nougat Message-Id: <20070726022738.4350E4D058D@magilla.localdomain> Date: Thu, 26 Jul 2007 02:27:00 -0000 X-IsSubscribed: yes Mailing-List: contact frysk-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: frysk-owner@sourceware.org X-SW-Source: 2007-q3/txt/msg00181.txt.bz2 I reduced your report to an isolated test case of trivial assembly. I've slightly modified addr2line so I'm using it as a test program with -e on the .o file to just print out the results of dwfl_module_addrsym. Please help me adjust this test case to match (or also include) cases equivalent to what you are seeing. .globl t1_global_outer t1_local_st_size_0: t1_global_outer: nop t1_local_in_global: nop .size t1_local_in_global, .-t1_local_in_global 1: nop .size t1_global_outer, .-t1_global_outer .space 100 .balign 8 .globl t2_global_symbol t2_local_st_size_0: t2_global_symbol: nop nop .size t2_global_symbol, .-t2_global_symbol 2: .space 100 .balign 8 .globl t3_global_after_0 t3_global_after_0: nop t3_local_0_in_global: 3: nop .size t3_global_after_0, .-t3_global_after_0 .data t1_pc_of_interest: .long 1b t2_pc_of_interest: .long 2b t3_pc_of_interest: .long 3b as on that produces the following (readelf -rs; objdump -d). The reloc addends tell you the "pc_of_interest" values. On i386 (non-rela), you'd need to look at objdump -s -j .data instead to see them. Relocation section '.rela.data' at offset 0x560 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000000 00010000000a R_X86_64_32 0000000000000000 .text + 2 000000000004 00010000000a R_X86_64_32 0000000000000000 .text + 6a 000000000008 00010000000a R_X86_64_32 0000000000000000 .text + d1 Symbol table '.symtab' contains 14 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 3: 0000000000000000 0 SECTION LOCAL DEFAULT 4 4: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 t1_local_st_size_0 5: 0000000000000001 1 NOTYPE LOCAL DEFAULT 1 t1_local_in_global 6: 0000000000000068 0 NOTYPE LOCAL DEFAULT 1 t2_local_st_size_0 7: 00000000000000d1 0 NOTYPE LOCAL DEFAULT 1 t3_local_0_in_global 8: 0000000000000000 0 NOTYPE LOCAL DEFAULT 2 t1_pc_of_interest 9: 0000000000000004 0 NOTYPE LOCAL DEFAULT 2 t2_pc_of_interest 10: 0000000000000008 0 NOTYPE LOCAL DEFAULT 2 t3_pc_of_interest 11: 0000000000000000 3 NOTYPE GLOBAL DEFAULT 1 t1_global_outer 12: 0000000000000068 2 NOTYPE GLOBAL DEFAULT 1 t2_global_symbol 13: 00000000000000d0 2 NOTYPE GLOBAL DEFAULT 1 t3_global_after_0 addrsym-test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 : 0: 90 nop 0000000000000001 : 1: 90 nop 2: 90 nop ... 67: 90 nop 0000000000000068 : 68: 90 nop 69: 90 nop ... ce: 66 90 xchg %ax,%ax 00000000000000d0 : d0: 90 nop 00000000000000d1 : d1: 90 nop It's possible the order of symbols in the table is an issue. So let me know if those are different in your case. > local_st_size_0: // this symbol has no size > > global_outer: > nop > local_in_global: > nop > .size local_in_global, .-local_in_global > nop > <> > .size global_outer, .-global_outer > > that is global_outer contains a nested symbol but the "pc" is beyond > that back in the outer/global symbol. > > I'm guessing that "global_outer" should be returned. Currently > local_st_size_0 is returned :-( Arguably the answer should be no symbol, since the address is past the end of the nearest symbol's size. This is t1 in my test case, looking at pc=0x2. I get t1_global_outer+0x2 from addrsym here, so my case must differ from what you tried here. Can you figure out how they differ? > This is the no-symbol case, there is a hole in the memory where there is > no valid symbol vis: > > local_st_size_0: // this symbol has no size > > global_symbol: > nop > nop > .size global_symbol, .-global_symbol > > << you are here >> > > I'm guessing it should not get a symbol at all (the [unknown]). It > currently gets the nearest unsized symbol. This is t2 in my test case, looking at pc=0x6a. I get t2_local_st_size_0+0x2 here. Go figure. You wrote your marker in a different place, but there is no address difference between the context before the .size directive in the assembly source and the context after it. So unless I'm misunderstanding what cases you intended to describe, this should be the same case as t1. (It doesn't really matter that there is a local symbol nearby, since the PC of interest is unambiguous outside that symbol's address range.) I'll look into why they come out differently, which might relate to some other symptom. I also agree that the right answer is no symbol. > These are cases where there is a nested symbol within a sized symbol vis: > > global_after_0: > nop > local_0_in_global: > << you are here >> > nop > .size global_after_0, .-global_after_0 > > here, since the PC is exactly at the unsized local symbol I'm guessing > that it should return that. It currently gets the containing sized symbol. This is my t3. I get t3_global_after_0+0x1 as you say. This is the intended behavior, not a bug. We can discuss what the behavior should be. /* Handwritten assembly symbols sometimes have no st_size. If no symbol with proper size includes the address, we'll use the closest one that is in the same section as ADDR. */ Usually size-0 symbols are local assembler labels, and sized symbols are the entry points. For things like backtraces, people usually want to see the symbol names for the entry points (plus offsets) rather than the local labels that often have unhelpful names. That's what I had in mind when I wrote that. addrsym started out as addrname, which does not pass back an offset and only used for "what function is this in?" kinds of queries. For that, it clearly makes more sense to prefer the containing sized symbol. However, for things like disassembly, people probably would like to see the local label names, or perhaps both names. Thanks, Roland