From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22054 invoked by alias); 9 Aug 2016 06:51:39 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 22045 invoked by uid 89); 9 Aug 2016 06:51:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=traces, H*MI:sk:f2a92cf, H*f:sk:f2a92cf, H*i:sk:f2a92cf X-HELO: mail-wm0-f44.google.com Received: from mail-wm0-f44.google.com (HELO mail-wm0-f44.google.com) (74.125.82.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 09 Aug 2016 06:51:28 +0000 Received: by mail-wm0-f44.google.com with SMTP id i5so13299101wmg.0 for ; Mon, 08 Aug 2016 23:51:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=98ti1HMA4PRYzviHCH5M3kzcXrOUttGYzTFYsypT2ws=; b=IngpnDyXAwMSkOmCcJb/pLwKeUVGKtkCiNVg92olp47ygaiwP/e250mz+ZCx6FsBxK ZBGHK3pGkDpFKPWiPK8zTtCoNmDY4jM40NVfpWivasnJEF/iT2iXOOB2LRmTMzZ6OSJ/ Fk3YMJBoFcXuSLrP6QL6lmD4zh8pI3EhSWg84Y04pXgDWCGDLRs3m1zEXHbyun2/i98q t3lV/O3/t8Vh2pwrsmoobnt99J70g2EKpN7FujtNRd+6eJo80pSZJSB3OK66uKY1myBA R5IapDwOp3jvpDnuZVHgLmmJRlNPMITxPi33Jq75KGx0eq1zV/eOf1NHOuddgCmyfxpz 65gQ== X-Gm-Message-State: AEkoouuU1gzTJ6vckN398I/pNcnHISYAgx4QLo7vcccIYUgOrf0BbaL7KttmjSbm12ihaA== X-Received: by 10.194.176.165 with SMTP id cj5mr20246958wjc.82.1470725485057; Mon, 08 Aug 2016 23:51:25 -0700 (PDT) Received: from avi.cloudius-systems.com ([37.142.229.250]) by smtp.gmail.com with ESMTPSA id f10sm36291341wje.14.2016.08.08.23.51.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Aug 2016 23:51:24 -0700 (PDT) Subject: Re: Some newbie questions To: David Smith , systemtap@sourceware.org References: From: Avi Kivity Message-ID: Date: Tue, 09 Aug 2016 06:51:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-q3/txt/msg00140.txt.bz2 On 08/08/2016 05:58 PM, David Smith wrote: > >> 2. Inlined functions >> >> From the manual pages, it seems that inlined functions can be probed >> (minus the .return probe), but I just get an error: >> >> semantic error: while resolving probe point: identifier 'process' at >> script/home/avi/seastar/debug/task-latency.stap:3:7 >> source: probe process("scylla").function("reactor::run_tasks()") { >> ^ >> >> semantic error: no match (similar functions: >> _ZN7reactor14run_exit_tasksEv, statfs, dup, mkdir, ntohs) > We're at the mercy of the compiler and the quality of the debuginfo it > produces here when looking for inlined functions. > > If you want to investigate further here, you'll need to do the following: > > # eu-readelf -N --debug-dump=info scylla > scylla.log > > In scylla.log, look and see if you can find a subprogram with the name > of the function you are interested in. [1417f4] subprogram external (flag_present) Yes name (strp) "run_tasks" decl_file (data1) 5 decl_line (data2) 812 linkage_name (strp) "_ZN7reactor9run_tasksER15circular_bufferISt10unique_ptrI4taskSt14default_deleteIS2_EESaIS5_EE" declaration (flag_present) Yes object_pointer (ref4) [141808] sibling (ref4) [141813] [141808] formal_parameter type (ref4) [1423bd] artificial (flag_present) Yes [14180d] formal_parameter type (ref4) [14a8a8] The debugger has no trouble setting breakpoints on the function or showing it in backtraces. In any case I sprinkled some static probes there. >> 3. Process CPU timers >> >> (more of a feature request) >> >> I'm trying to find causes of latency in my program. To do that, I'm >> running a periodic timer and checking whether a function takes more time >> than some threshold. >> >> Ideally, I'd be able to arm the timer on the function entry point and >> disarm it on exit, rather than have it run continuously; this would need >> to be a per-thread cpu-time timer (e.g. CLOCK_THREAD_CPUTIME_ID)/ >> >> Here's my current script for reference ("running" and "start_time" need >> to become maps for it to be thread-safe): >> >> #!/usr/bin/stap >> >> global start_time >> global running >> >> probe begin { >> running = 0 >> } >> >> probe >> process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") >> { >> start_time = gettimeofday_us() >> running = 1 >> } >> >> probe >> process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") >> { >> running = 0 >> } >> >> probe timer.ms(10) { >> now = gettimeofday_us() >> if (running && (now - start_time) > 30000) { >> printf("detected tasks running for >30ms\n") >> print_usyms(ubacktrace()) >> } >> } >> >> I'd appreciate any tips as to whether there's a better way to do this. > The above isn't really going to work well, for several reasons: > > 1) You've only got one global 'start_time' and 'running' variables. If > scylla is multithreaded or more than one instance is running, that isn't > going to work. > > To fix this, you'd want to do convert them to arrays and index them by > thread ids, like this: > > start_time[tid()] = gettimeofday_us() Yes, I found that out and my latest version works like this, but I ran into severe scaling issues. I also switched to timer.profile. What's its resolution on normal machines? > 2) Your ubacktrace() call in the timer function isn't going to report > what you want. At that point, you aren't running in the context of the > process, you are running in the context of whatever kernel thread is > running the timer. I'm really sure what a user backtrace will report at > that point, but it certainly won't be useful. Actually I got good backtraces when I instrumented a task to spin for 100ms. I also got some bad traces, but did not investigate more. > Here's an (untested) version of what would work a bit better: > > ==== > global start_time > > probe > process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") > { > start_time[tid()] = gettimeofday_us() > } > > probe > process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") > { > now = gettimeofday_us() > start = start_time[tid()] > delete start_time[tid()] > if ((now - start) > 30000) { > printf("detected tasks running for >30ms (%d)\n", > (now - start)) > print_usyms(ubacktrace()) Here the backtrace is useless, I know exactly what it looks like at this point; it's two frames away from main(). To give some context, this is a run-to-completion scheduler. I have tasks pushed into a queue from various sources (including other tasks), and this bit is the part of the main loop that dequeues a task, runs it, and repeats. Some tasks take too long, and I want to identify which ones. I could print the last task's vtable, but that is much less useful than a full backtrace at the time the task is detected as taking too long. > } > } > ==== > > If you like, you could add a timer probe to the above to report an > ongoing problem in the current process (but without printing a backtrace). > Unfortunately I already know there is a problem and need to pinpoint it. Which is where the backtrace is so useful. I can fall back to instrumenting the code with a timer (from timer_create(3)), but I'd much rather have the flexibility of non-intrusive instrumentation.