From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10267 invoked by alias); 16 Nov 2007 22:03:27 -0000 Received: (qmail 10258 invoked by uid 22791); 16 Nov 2007 22:03:26 -0000 X-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DK_POLICY_SIGNSOME,FORGED_RCVD_HELO X-Spam-Check-By: sourceware.org Received: from tomts5.bellnexxia.net (HELO tomts5-srv.bellnexxia.net) (209.226.175.25) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 16 Nov 2007 22:03:19 +0000 Received: from toip3.srvr.bell.ca ([209.226.175.86]) by tomts5-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20071116220316.UNEN17217.tomts5-srv.bellnexxia.net@toip3.srvr.bell.ca> for ; Fri, 16 Nov 2007 17:03:16 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq4HAP2iPUdMROHU/2dsb2JhbACBXw Received: from bas5-montreal19-1279582676.dsl.bell.ca (HELO krystal.dyndns.org) ([76.68.225.212]) by toip3.srvr.bell.ca with ESMTP; 16 Nov 2007 17:01:08 -0500 Received: from localhost (localhost [127.0.0.1]) (uid 1000) by krystal.dyndns.org with local; Fri, 16 Nov 2007 17:03:14 -0500 id 001CE3D1.473E13A2.000012AB Date: Fri, 16 Nov 2007 22:03:00 -0000 From: Mathieu Desnoyers To: "Frank Ch. Eigler" Cc: ltt-dev@shafik.org, Systemtap List Subject: Re: patches to actually use markers? Message-ID: <20071116220314.GA3197@Krystal> References: <472633E3.1050507@redhat.com> <20071029220454.GB4233@Krystal> <4728AD5F.1010604@redhat.com> <473DEBB7.40607@redhat.com> <20071116192415.GA25794@Krystal> <20071116201015.GA29545@Krystal> <20071116202645.GB25326@redhat.com> <20071116203539.GA32261@Krystal> <20071116204149.GD25326@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20071116204149.GD25326@redhat.com> X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 16:36:24 up 13 days, 2:41, 4 users, load average: 0.95, 0.67, 0.73 User-Agent: Mutt/1.5.16 (2007-06-11) X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2007-q4/txt/msg00364.txt.bz2 * Frank Ch. Eigler (fche@redhat.com) wrote: > Hi - >=20 > On Fri, Nov 16, 2007 at 03:35:39PM -0500, Mathieu Desnoyers wrote: > > [...] > > > I see. Yes, per-systemcall markers would be welcome by our group, and > > > ones not dependent on TIF_TRACE or whatnot even more so. But were > > > trying not to get too optimistic. > >=20 > > I use per-systemcall markers for the principally useful systemcalls, but > > I also instrument syscall_trace() to get all the other syscalls (new > > ones, etc..). >=20 > So then some system calls would get duplicate trace reports, and some > would not get arguments at all? Does not sound ideal. >=20 We currently have three distinct events for a system call : syscall entry, with syscall id and instruction pointer the syscall specific instrumentation (opt) syscall exit One of the benefit to have syscall entry/exit with minimal information is that we can put them really close to the "real" event, i.e. : passing from userspace to kernel space. It becomes useful when people want a precise accounting of the kernel vs userspace time. Therefore, the results will be as close as possible to results taken by a profiler. Having limited information passed to the syscall entry/exit instrumentation helps knowing the number of cycles wrongly accounted. We do not currently alter the statistics to take that into account, but we plan to do this in the future. Having anything complicated could cause the number of cycles wrongly accounted to vary between each event, which is unwanted. Instrumentation within the syscall specific function helps knowing when/if the operation has really been done _within the kernel_. It may imply putting the event within the bounds of existing locks to be as sure as possible two related events happening on different CPUs won't be in the wrong order. Ideally, the instrumentation of the syscall "effect on the internal data structures of the kernel" should be as close as possible to the actual memory modification. Given these two opposite sets of constraints, I think having more than one instrumentation site per syscall makes sense. Moreover, markers are really cheap... :) > > I add my own TIF_KERNEL_TRACE, which is a thread flag enabled in > > each and every thread when tracing is active. [...] >=20 > Who has responsibility to manage this flag? Would it be reference > counted, so that e.g. two ltt and a third systemtap script all hook > up to these markers, the flag will will stay set? It would be nice to > measure the impact of ordinary, unconditional markers in the > system-call functions. >=20 Already did. Inactive markers, with high memory pressure, we must do 2 memory reads (that's the cycles difference we get). If they are in cache, it's hard to see a difference. I think I've documented that in the markers or immediate values patch header. For active markers, I did some testing a while ago.. I could dig the ML to find these results. Yes, refcount would be the way to go. The code is currently in kernel/sched.c, since it touches the threads. I would have to add the refcount. It will be in the next LTTng prerelease. > > > If "we" is a marker callback function that is given the system call > > > number, it can be taught. This is the sort of thing we do currently > > > in systemtap script code based upon kprobes. > >=20 > > Yeah.. but I fear that within the kernel it can become quickly very > > ugly. >=20 > It's an inherent tradeoff between a small generic hook versus many > specialized hooks. Look how the audit system deals with decoding > syscalls. It's not THAT bad. >=20 Hrm, it's just that it centralizes something that would be good to leave to each subsystem's expert, which is what information specific to a given system call is interesting and when is the best moment to record it. Just like I would leave to the architecture experts the final word on when it's best to record the system call entry/exit event. Mathieu --=20 Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68