From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-25034-listarch-systemtap=sources.redhat.com@sourceware.org>
Received: (qmail 115762 invoked by alias); 29 Nov 2016 03:31:14 -0000
Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <systemtap.sourceware.org>
List-Subscribe: <mailto:systemtap-subscribe@sourceware.org>
List-Post: <mailto:systemtap@sourceware.org>
List-Help: <mailto:systemtap-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: systemtap-owner@sourceware.org
Received: (qmail 115663 invoked by uid 89); 29 Nov 2016 03:31:07 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=cmdline, fcheredhatcom, U*fche, fche@redhat.com
X-HELO: mail-vk0-f67.google.com
Received: from mail-vk0-f67.google.com (HELO mail-vk0-f67.google.com) (209.85.213.67) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Nov 2016 03:30:57 +0000
Received: by mail-vk0-f67.google.com with SMTP id w194so6236194vkw.3        for <systemtap@sourceware.org>; Mon, 28 Nov 2016 19:30:57 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=qWvG5LCPQJF2fkbM2st/wWiWpjzcWjUeU3BCpP8cGvc=;        b=AAlDnwzWgNdqCo91418EnVFqTZdEjI0kbizG97l3m9L1gh8gGKyACdEfbgJf9UgXNI         00EFzNhiZcQry31YiAi/qLRYOC/0sLbEiNDOx33FVVLz4pXTzCMeWUHdHdtFhWrvzw4t         410vi3Urnrtow7AHdSPdLfajnHhKJIjsN1h/dZWm/YPgpyLga5qM+12pSPF2vftGizC1         58Mphlh3sMIWC78oXuWLaaGszeW9t6CpjdYQtrTscq+1z/Mxiqrq6FQhHeOrk4JjRZpN         rw4ADjEAMaknvRgcwsi02R7SQyLuG1JLv8bg0adR1yL83SAsRFmf7a1m0teMDxiUQ9e+         qZ4Q==
X-Gm-Message-State: AKaTC00OlSmDVG+5coMoFMLHORWl7bZhdU3zH4X/gdvAxdM6iG4OT0iUqRIV4jX45rP5vPEX2/J7BykbNqVxpg==
X-Received: by 10.31.53.71 with SMTP id c68mr7454534vka.66.1480390255569; Mon, 28 Nov 2016 19:30:55 -0800 (PST)
MIME-Version: 1.0
Received: by 10.103.26.66 with HTTP; Mon, 28 Nov 2016 19:30:54 -0800 (PST)
In-Reply-To: <y0mmvgjo1jl.fsf@fche.csb>
References: <CAGPKeUKWAvBv=SL_ebVkR3ZwCeGZoCQ1MEJ5xQi-VXuF=kt3Wg@mail.gmail.com> <y0m1sxzpj9i.fsf@fche.csb> <CAGPKeUJjf5qqxO2Gqo-_LzzaFbwQ=-wZ=sq0fwTzm=c+MEAtSw@mail.gmail.com> <y0mmvgjo1jl.fsf@fche.csb>
From: Aubrey Li <aubreylee@gmail.com>
Date: Tue, 29 Nov 2016 03:31:00 -0000
Message-ID: <CAGPKeULiGUPMQ6e-YjSY3MTW+r=rjxYHYmoKEb1sU4DN+SL1Lg@mail.gmail.com>
Subject: Re: kernel function probe overhead
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: systemtap@sourceware.org
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2016-q4/txt/msg00094.txt.bz2

On Tue, Nov 29, 2016 at 3:19 AM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
>
> aubreylee wrote:
>
> >> A couple of things come to mind.  Use of gettimeofday_ns() function uses
> >> a software-estimated real-time clock, which may not be steady enough for
> >> high-grained measurements like this
> >
> > Timestamp function: gettimeofday_ns()
> > [...]
> > Timestamp function: get_cycles()
> > [...]
> > Timestamp function: local_clock_ns()
> > [...]
>
> Thanks for testing!  The distribution of those three seem remarkably
> similar - a nice central peak, and a second small peak at 4-8x the
> first.  It suggests we can probably rule out clock problems.
>
>
> >>  The average is pretty low.  Consider printing a
> >> histogram of t_end, or (if using git systemtap) its @variance.
> >>
> >> Another possibility is contention over the t_start[] variable, if
> >> multiple processors are running this region of code, so trigger the
> >> systemtap probes concurrently.  systemtap automatically locks such
> >> global variables, esp. if they contain non-aggregate values like the
> >> scalar timestamps.
> >
> > Do we have an approach to probe the function on CPU0 only?
>
> Not directly; the kernel does not give us such control.  The probe could
> include a
>
>    if (cpu() != 0) next;
>
> at the top of the probe handler body.  That way locks still are taken,
> but released quickly on the other CPUs.
>

Thanks for the suggestion.

Original result:
t_end @count=4800 @min=740 @max=300281 @sum=90589877 @avg=18872

Add if(cpu()!=0) next in the probe handler:
t_end @count=161 @min=653 @max=114856 @sum=1780522 @avg=11059

And if I add "maxcpus=1" to the kernel cmdline:
t_end @count=1190 @min=573 @max=14529 @sum=1406437 @avg=1181

The last one is much better, so it looks like we have a scalability
opportunity here.

> Another solution would be to have per-thread global variables.  Since
> the kernel doesn't help us out with this, we'd have to emulate it
> somehow, and for this to be wortwhile, it'd have to work faster than
> ordinary locked/shared array[tid()] accesses.
>

Is this a suggestion for me? How to do this?

>
> >> Consider using "stap -t FOO.stp" to have the tool print internal
> >> estimates of its own probes' runtimes.  That would include lock
> >> contention, but exclude kernel-side overheads like the code involved in
> >> the its kprobes machinery.
> >
> > "-t" option output.
>
> Ah, I note that we don't track occurrences of lock contention,
> even though we could.  Opened https://sourceware.org/bugzilla/show_bug.cgi?id=20879

okay, thanks, I'll keep an eye on this ticket.

>
> > [...]
> > kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call,
> > (test2.stp:13:1), hits: 2801, cycles: 560min/34290avg/230611max, from:
> > [...]
> > kernel.function("local_touch_nmi@arch/x86/kernel/nmi.c:561").call,
> > (test2.stp:18:1), hits: 2801, cycles: 396min/5118avg/160096max, from:
> > [...]
>
> That inconsistency dramatic contention or caching or some other source
> of jitter.  Having the variance figure in there would be useful though.
> Would anyone like to give that a try?  (translate.cxx line 2283ish and
> runtime/linux/common_session_state.h _stp_stat_init area).
>

If there is a patch delivered against git head, I can apply it and
paste the output here.

Thanks,
-Aubrey

> - FChE