From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14087 invoked by alias); 25 Nov 2016 17:22:05 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 14071 invoked by uid 89); 25 Nov 2016 17:22:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=sk:softwar, realtime, real-time, contention X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 25 Nov 2016 17:22:03 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9E368C057FA6; Fri, 25 Nov 2016 17:22:02 +0000 (UTC) Received: from fche.csb (vpn-53-177.rdu2.redhat.com [10.10.53.177]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAPHM2BZ015203; Fri, 25 Nov 2016 12:22:02 -0500 Received: by fche.csb (Postfix, from userid 2569) id AAD7158AF5; Fri, 25 Nov 2016 12:22:01 -0500 (EST) From: fche@redhat.com (Frank Ch. Eigler) To: Aubrey Li Cc: systemtap@sourceware.org Subject: Re: kernel function probe overhead References: Date: Fri, 25 Nov 2016 17:22:00 -0000 In-Reply-To: (Aubrey Li's message of "Sat, 26 Nov 2016 00:07:51 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2016-q4/txt/msg00088.txt.bz2 aubreylee wrote: > [...] > Basically I sampled two kernel function probes, > > probe kernel.function(arch_cpu_idle_enter).call > probe kernel.function(local_touch_nmi).call ... where one directly calls the other. Good test! > because local irq is disabled, I suppose the duration of these two > probes to be ZERO. Yes, ideally. > However, I run the attached stp script and here is what I got: > [...] > Total time: 10106 miliseconds > t_end @count=2908 @min=676 @max=88152 @sum=15005802 @avg=5160 > the duration varies from 676 to 88152, I understand all the tracing > mechanism should have overhead, while 676ns is acceptable, but 88152 > is not. > probe kernel.function("arch_cpu_idle_enter").call > t_start[cpu()] = gettimeofday_ns() > probe kernel.function("local_touch_nmi").call > t_end <<< gettimeofday_ns() - t_start[cpu()] A couple of things come to mind. Use of gettimeofday_ns() function uses a software-estimated real-time clock, which may not be steady enough for high-grained measurements like this. Consider using get_cycles() or the functions in the [man tapset::timestamp_monotonic] tapset. Another possibility is jitter like caching effects within the kernel due to activity elsewhere. The average is pretty low. Consider printing a histogram of t_end, or (if using git systemtap) its @variance. Another possibility is contention over the t_start[] variable, if multiple processors are running this region of code, so trigger the systemtap probes concurrently. systemtap automatically locks such global variables, esp. if they contain non-aggregate values like the scalar timestamps. Consider using "stap -t FOO.stp" to have the tool print internal estimates of its own probes' runtimes. That would include lock contention, but exclude kernel-side overheads like the code involved in the its kprobes machinery. - FChE