From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11517 invoked by alias); 9 Jan 2010 02:46:30 -0000 Received: (qmail 11508 invoked by uid 22791); 9 Jan 2010 02:46:29 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from e28smtp07.in.ibm.com (HELO e28smtp07.in.ibm.com) (122.248.162.7) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 09 Jan 2010 02:46:24 +0000 Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by e28smtp07.in.ibm.com (8.14.3/8.13.1) with ESMTP id o092kJ7k025373 for ; Sat, 9 Jan 2010 08:16:19 +0530 Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o092kJ4b3883162 for ; Sat, 9 Jan 2010 08:16:19 +0530 Received: from d28av05.in.ibm.com (loopback [127.0.0.1]) by d28av05.in.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o092kJIG018198 for ; Sat, 9 Jan 2010 13:46:19 +1100 Received: from in.ibm.com ([9.77.195.218]) by d28av05.in.ibm.com (8.14.3/8.13.1/NCO v10.0 AVin) with ESMTP id o092kGu3018195 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Sat, 9 Jan 2010 13:46:17 +1100 Date: Sat, 09 Jan 2010 02:46:00 -0000 From: "K.Prasad" To: Roland McGrath Cc: Prerna Saxena , systemtap@sourceware.org Subject: Re: [RFC] Systemtap translator support for hardware breakpoints on Message-ID: <20100109024616.GA6810@in.ibm.com> Reply-To: prasad@linux.vnet.ibm.com References: <4B459CC8.2030402@linux.vnet.ibm.com> <20100108015301.72EF67300@magilla.sf.frob.com> <20100109011140.GB3486@in.ibm.com> <20100109014457.EF791CC@magilla.sf.frob.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100109014457.EF791CC@magilla.sf.frob.com> User-Agent: Mutt/1.5.19 (2009-01-05) X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2010-q1/txt/msg00073.txt.bz2 On Fri, Jan 08, 2010 at 05:44:57PM -0800, Roland McGrath wrote: > > If my understanding is correct, this is a suggestion that demands an > > 'overcommit' feature (ability to accept requests more than the available > > debug registers) in hw-breakpoints, right? > > Actually that's exactly not what I was talking about. It's an > interesting subject to consider (I was the one who originally > proposed such complexity for hw_breakpoint at its inception). > > But all I was talking about here is what a stap module can do when a > "allocate it now and forever" call fails. (This might be at script > startup time, or might be later during a module's lifetime if the > putative script-driven dynamic registration feature were used.) > Yes, I remember reading-through the discussions you had with Alan Stern as a result of which his code, originally, had the over-commit and prioritisation feature but was subsequently removed during a re-design as demanded by the community/maintainer. Post perf-events integration, the over-commit is available in the form of 'un-pinned' requests. > > In its new form (post perf-events integration), hw-breakpoints can > > indeed accept new requests that far exceed the number of underlying > > debug registers. This can be achieved by making an 'un-pinned' breakpoint > > request, where every such request gets a chance to use the debug > > register in a round-robin fashion (all this is provided by perf-events > > infrastructure anyway). > > I don't understand what "round-robin" means for breakpoints. When > you let the kernel execute normally again after registration, either > my breakpoint is enabled or it isn't. There is no meaningful sense > in which you can "time-share" a hardware breakpoint slot. (That is, > except for doing per-task registrations, which is a different > semantics entirely.) > As I said before, it is a 'feature' 'acquired' by breakpoints by virtue of using the perf-events layer underneath. One can think of it remotely being useful to profile (for reads/writes) a large number of addresses simultaneously (reference:LKML message-id:1248856132.6987.3034.camel@twins and related thread) to 'statistically' indicate a hit-count for the same; however disastrous to use in debugging roles. > > Presently, the breakpoint infrastructure does not provide callbacks that > > can be invoked whenever an 'un-pinned' breakpoint request is > > scheduled-in/out (analogous to .enabled and .disabled). We could pursue > > to get support for the same (of course, that would require a good > > in-kernel user to convince the community!). > > I am having trouble imagining what any kind of "scheduled-in/out" > could possibly useful to do at all if you don't notify me about > whether or not my breakpoint is in place. A "best effort" > breakpoint, that might be caught and might not be and who knows > whether it's really installed, is just not useful. I must be > missing the essence of what you mean. > Unfortunately, what you've described above as "best effort" breakpoint is what 'un-pinned' breakpoint now is. Again, it is a carry-over from perf-events where the said behaviour suits fine for other performance monitoring counters (provided by the PMU on Intel x86 processors). There isn't an in-kernel user for 'un-pinned' breakpoint requests at present and perhaps when such need arises, a notifier/callback implementation during sched-in/out would also follow. > The thing I had talked about before was each hw_breakpoint > registration having a priority number. When another registration > comes along with a higher priority, it can boot yours and make a > callback so you know it's been stolen. Conversely, when a competing > registration goes away, the highest-priority registration-in-waiting > gets installed and gets a callback to tell you it's now active. (I > guess really there could just be a single notifier list that gets > called when a slot becomes free, so the suitors can try again to see > whose priority wins. Whatever.) But you've said this is not what > it does now. > Yes, priority based hw-breakpoint registration (and queueing of new requests) is no longer there. > Anyway, that kind of dynamicism is not what I was talking about here. > If we had it, the script-language features might look rather similar > and so work how you're talking about here. i.e., the .enabled and > .disabled probes firing due to an hw_breakpoint layer callback that is > "spontaneous" to the stap module's eyes, i.e. driven by the ebb and > flow of external demand on the scarce shared resource. > > In the examples I gave, the .unavailable sub-probe in the simplest > form is just a translation-time (tapset sugar) way to do some implicit > script-level stuff at 'probe begin' time, but conditional on whether > the associated hw_breakpoint registration at startup succeeded. > > In the example with dynamic (i.e. runtime script-controlled) > registration, the .begin and .end sub-probes are just tapset sugar for > some script-level stuff to do implicitly when script code uses "enable > watch_foo" and "disable watch_foo". For that use, you might have a > .unavailable sub-probe that runs when "enable" fails, or you might > just have "enable" return a testable success/failure to the script > code or whatnot. > While hw-breakpoint APIs could originally be enabled from most contexts, I'm unsure about the same post perf-integration. As Prerna stated, it needs to be verified to know if (un)register_<> breakpoint interfaces can be successfully invoked from a kprobe handler context. > Given all that, I can imagine wanting the tie-in for some kind of > "breakpoint scheduling" to be that "enable" has "enable-now-or-fail" > and "enable-now-or-later" variants. Then in the "now or later" > variant, those same .begin and .end probes might get run during the > lifetime of a script (due to an hw_breakpoint layer callback) rather > than only at "enable foo" and "disable foo" time. > The kernel hw-breakpoint API supports only a "enable-now-or-fail" variant and I think that the script cannot implement "enable-now-or-later" without kernel support (to notify a vacated debug register). This means all 'pinned' breakpoint requests beyond HBP_NUM (HBP_NUM = number of debug registers per-CPU) would always fail and must fall-back to .unavailable probe. The failure can happen early in case of pre-existing breakpoint requests. So, is the .unavailable probe's role just to warn the user that his request has failed? Or do you see more uses for the same (that isn't obvious to me)? Thanks, K.Prasad