From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27703 invoked by alias); 24 Aug 2010 23:15:49 -0000 Received: (qmail 27694 invoked by uid 22791); 24 Aug 2010 23:15:48 -0000 X-SWARE-Spam-Status: No, hits=-4.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,TW_TM,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 24 Aug 2010 23:15:42 +0000 Received: from int-mx08.intmail.prod.int.phx2.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o7ONFfg5027261 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 24 Aug 2010 19:15:41 -0400 Received: from patootie.office.frob.com (ovpn-113-34.phx2.redhat.com [10.3.113.34]) by int-mx08.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7ONFepe004335; Tue, 24 Aug 2010 19:15:41 -0400 Received: from magilla.sf.frob.com (magilla.office.frob.com [198.49.250.228]) by patootie.office.frob.com (Postfix) with ESMTP id 4584B2F77; Tue, 24 Aug 2010 16:15:40 -0700 (PDT) Received: by magilla.sf.frob.com (Postfix, from userid 5281) id 4C2544048C; Tue, 24 Aug 2010 16:15:39 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roland McGrath To: Josh Stone Cc: systemtap@sources.redhat.com Subject: Re: semantic error: multiple addresses for ... In-Reply-To: Josh Stone's message of Tuesday, 24 August 2010 13:34:42 -0700 <4C742CE2.7070408@redhat.com> References: <20100820203946.0169540144@magilla.sf.frob.com> <4C742200.7000401@redhat.com> <20100824195519.9C0D74048C@magilla.sf.frob.com> <4C742CE2.7070408@redhat.com> Message-Id: <20100824231539.4C2544048C@magilla.sf.frob.com> Date: Tue, 24 Aug 2010 23:15:00 -0000 X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2010-q3/txt/msg00291.txt.bz2 > On 08/24/2010 12:55 PM, Roland McGrath wrote: > > Off hand, I see two potential approaches: > > > > 1. Pay attention to is_stmt. > > I mentioned this earlier. To consider this, we'd really need to find > > some cases where the current code legitimately complains as in PR1306, > > and check whether the is_stmt flags in that DWARF info are useful. > > I'm not sure how to check is_stmt, but here's a reproducer at a similar > location to the original bug report. dwarf_linebeginstatement is the call. There is no easy way to see it in readelf output, you have to use eu-readelf --debug-dump=line and grok the line program by eyeball. I just added the -F (--flags) option to eu-addr2line to show that kind of stuff. It shows: $ ./src/addr2line -k -F 0xffffffff810fe83d 0xffffffff810fe849 fs/open.c:1053 (is_stmt) fs/open.c:1053 (is_stmt) So both line entries do have the is_stmt flag, and it's not telling us anything useful. In fact, an eyeball check on --debug-dump=line output shows that the compiler sets is_stmt to true and never toggles it anywhere. This reminds me that I think this is something that the GCC folks are planning to work on in the near future, in fact almost certainly motivated by similar concerns for other debuggers (i.e. GDB). (I might have remembered that from a discussion we happened to have elsewhere in the last week or two, but I didn't.) So, not something we can use today. > Isn't it possible for a line to legitimately occur multiple times in the > same scope? For example, a statement within an unrolled loop *should* > get multiple probes after all, right? Yes, that's true. But since currently we are breaking all legitimate cases, including ones with simple concrete examples, breaking only this subset of legitimate cases (of which we have no known examples off hand) is clearly an improvement. For that situation, each code site would have is_stmt true (even if correctly tracked). So, if, at some future date, we can rely on is_stmt flags being useful, then doing that alone will probably be the right choice. (The definition of is_stmt in DWARF actually is exactly "is a recommended breakpoint location." It really is intended for precisely what we want.) What seems far more likely today (and is in my concrete test case, albeit a synthetic one for unrelated purposes) is to have multiple inlined instances of the same inline function inside the same actual function. In this case, the innermost scope of each is the inlined_subroutine, and their PC ranges will usually be disjoint. (Note this should be so even if their code is interleaved--each will have noncontiguous PC sets that indicate the exact interleaving.) Of course, there are lots of optimization possibilities. But this is a heuristic to work with the limited information we have. Ultimately, the best solution is for the compiler to decide where the breakpoint for a source location belongs, and that's exactly what is_stmt is for. But we don't have the compiler support for that yet. The status quo is that we punt all cases to the user to work around manually. So any heuristic to reduce the cases we punt is safe as long as it errs on the side of punting cases that might not be intended to have multiple matching PCs. To start with, cases in different innermost scopes that have disjoint ranges seem safe in this sense. As we come along other concrete cases where we are punting when we should probe, we can consider adding to the heuristic. > The funny thing is that the current rule is only applied for statement > probes on line numbers (see the use of need_single_match). Function > probes with line numbers are allowed to have duplicates, probably for > the exact reason you're complaining about, that there may be multiple > inline instances. The source location in .function is a very different animal. It's really not about finding a particular PC at all. It's about disambiguating which source function you meant, if there might be multiple by the same name (e.g. statics in different CUs of a module) or you use wildcards for some reason. So a very different logic applies. (I'm not saying that what the translator does today necessarily comports with such a logic as we would articulate it. But it should not be a surprise if it's closer to that ideal logic than to the logic used to resolve .statement, either the ideal logic for that or the manifest behavior of .statement as it is.) The source locations yield PCs, and those yield concrete DIE scopes. The innermost subprogram or inlined_subroutine scope is the one that is relevant to identifying the function. Each inlined_subroutine is only relevant in its abstract_origin attribute that leads to a subprogram. If that final set of subprogram DIEs is all just the same one subprogram, then there is no ambiguity in the match. If you have matches in multiple CUs and all of them are an identical-looking subprogram (i.e. same name, signature, and same source location after canonicalizing source file names), then that too is an unambiguous single match. In that case, it's a "single" match of an inline function defined in a header file or something like that, where getting all the inlined instances across all the CUs is exactly what's intended (including some that call the source file foo.h and some that call it ../foo/foo.h, etc.). Thanks, Roland