From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1085 invoked by alias); 28 Apr 2009 17:01:32 -0000 Received: (qmail 32403 invoked by uid 22791); 28 Apr 2009 17:01:29 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_13,J_CHICKENPOX_75,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 28 Apr 2009 17:01:20 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n3SH1IY9006741 for ; Tue, 28 Apr 2009 13:01:18 -0400 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n3SH1Ie4028302 for ; Tue, 28 Apr 2009 13:01:18 -0400 Received: from ton.toronto.redhat.com (ton.yyz.redhat.com [10.15.16.15]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n3SH1HpH003315; Tue, 28 Apr 2009 13:01:18 -0400 Received: from ton.toronto.redhat.com (localhost.localdomain [127.0.0.1]) by ton.toronto.redhat.com (8.13.1/8.13.1) with ESMTP id n3SH1HQI031221; Tue, 28 Apr 2009 13:01:17 -0400 Received: (from fche@localhost) by ton.toronto.redhat.com (8.13.1/8.13.1/Submit) id n3SH1HDT031220; Tue, 28 Apr 2009 13:01:17 -0400 Date: Tue, 28 Apr 2009 17:01:00 -0000 From: "Frank Ch. Eigler" To: systemtap@sources.redhat.com, utrace-devel@redhat.com Subject: syscall tracing overheads: utrace vs. kprobes Message-ID: <20090428170117.GA30001@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2009-q2/txt/msg00459.txt.bz2 Hi - In a few contexts, it comes up as to whether it is faster to probe process syscalls with kprobes or with something higher level such as utrace. (There are other hypothetical options too (per-syscall tracepoints) that could be measured this way in the future.) It was time to check the intuitions about the overheads. So, choosing a syscall that won't get short-circuited via vdso: % cat foo.c #include int main () { unsigned c; for (c=0; c<10000000; c++) (void) close (1000); } % gcc foo.c Now we compare these scenarios: # stap -e 'probe never {}' -t --vp 00001 -c a.out Here, no actual probing occurs so we get a measurement of the plain uninstrumented run time of ten million close(2)s. # stap -e 'probe process.syscall {}' -t --vp 00001 -c a.out Here, we intercept sys_close with a kprobe. If the system is not too busy, we should pick up only the close(2)s coming from a.out, though a few close(2)'s executed by other processes may show up. # stap -e 'probe syscall.close {}' -t --vp 00001 -c a.out Here, we intercept all a.out's syscalls with utrace. Other processes are not affected at all, but other syscalls by a.out would be -- though in our test, there are hardly any of those. Some typical results on my 2.66GHz 2*Xeon5150 machine runnin Fedora 9 - 2.6.27.12: never: Pass 5: run completed in 740usr/3310sys/4155real ms. kprobe: probe syscall.close (:1:1), hits: 10000028, cycles: 176min/202avg/3632max Pass 5: run completed in 750usr/9320sys/10193real ms. utrace: probe process.syscall (:1:1), hits: 10000025, cycles: 176min/209avg/184392max Pass 5: run completed in 1670usr/6860sys/8645real ms. So utrace added 4.5 seconds, and kprobes added 6.0 seconds to the uninstrumented 4.1 second run time. But wait: we should subtract the time taken by the probe handler itself: 200ish cycles at 2.66 GHz, which is about 0.75 seconds. So the overheads are approximately: never: n/a kprobe: 5.2 seconds => 0.52 us per hit utrace: 3.6 seconds => 0.36 us per hit Note that these are microbenchmarks that represent an ideal case compared to a larger run, since they probably fit comfily inside caches. They probably also undercount the probe handler's run time. - FChE