From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12795 invoked by alias); 21 Sep 2011 19:46:29 -0000 Received: (qmail 12785 invoked by uid 22791); 21 Sep 2011 19:46:28 -0000 X-SWARE-Spam-Status: No, hits=-6.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 21 Sep 2011 19:46:10 +0000 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p8LJkAX5014235 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 21 Sep 2011 15:46:10 -0400 Received: from t510.usersys.redhat.com (vpn-8-209.rdu.redhat.com [10.11.8.209]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p8LJk86n026060 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 21 Sep 2011 15:46:09 -0400 Message-ID: <4E7A3F00.8060207@redhat.com> Date: Wed, 21 Sep 2011 19:46:00 -0000 From: David Smith User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20110906 Thunderbird/6.0.2 MIME-Version: 1.0 To: Dave Brolley CC: systemtap@sourceware.org Subject: Re: [Bug translator/13187] Reconsider the semantics of process(number).thread.begin/end References: <4E70F10B.2080201@redhat.com> In-Reply-To: <4E70F10B.2080201@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2011-q3/txt/msg00373.txt.bz2 On 09/14/2011 01:23 PM, Dave Brolley wrote: Sorry for the delay in responding to this. I'm also sorry for the incomplete response. But here goes with what I've got. ... stuff deleted ... Let's start with a discussion of 'PID'. What does that exactly mean? You probably already know this, I just want to get our terminology straight. >From a user's point of view, there are process IDs (returned by getpid()) and thread IDs (returned by gettid()). The gettid() man page states: In a single-threaded process, the thread ID is equal to the process ID (PID, as returned by getpid(2)). In a multithreaded process, all threads have the same PID, but each one has a unique TID. Here's where things get confusing. From the kernel's point of view, the thread ID is called 'pid' in the task structure and the process ID is called 'tgid' (thread group id) in the task structure. (It happened this way for historical reasons.) To avoid confusion, I'm going to call an individual thread a 'task' and the group of tasks with the same 'tgid' value a 'thread group'. [Here's something I've rediscovered about 'process(PID).*' probes. The task_finder only looks for processes in its initial pass. So, after stap starts up, it looks for all the PID probes, then never looks for them again. This was actually done on purpose, since task ids are non-predictable. If you ask us to probe a PID, it finishes and later a completely different process starts and ends up with the same PID (because PIDs will wrap after a certain number), we didn't want to accidently process that. The thought was that if you're probing specific PIDs, you meant those very tasks, not random ones that might come along later.] While researching this, I've discovered a bug in the task_finder. Where we're talking about 'process(PID).*' probes, there is actually some confusion in the task_finder code about what PID means (which actually relates to your question here). This command: stap -x PID -e 'probe process.* {}' treats PID differently than this one: stap -e 'probe process(PID).* {}' In the first one, we look for a thread group id value of PID. In the second case, we look for a task id value of PID. > I think that there is a lack of orthogonality in the current > implementation that is confusing. At least it is for me. > > 1) stap -e 'process("PATH").thread.begin {}' catches *all* child threads > of *processes* (not tasks) identified by PATH, *as they start* Your "as they start" distinction is not correct. 'process("PATH").begin' and 'process("PATH").thread.begin' probes will fire when attaching to existing tasks. So, if you "stap -e 'process("PATH").thread.begin {}", that .thread.begin probe will fire for all existing threads whose execname is PATH. The .thread.begin probe will later fire for all new threads whose execname is PATH also. Also note here that the .thread.begin probes fire in the context of the new thread, not the parent thread. If your probe body was something like: process("PATH").thread.begin { printf("pid %d tid %d\n", pid(), tid()) } You'll see 2 different numbers there. [Internally, the task_finder first loops through all existing tasks, looking for PATH and also attaches a probe to every task in the system, to help in monitoring new threads.] > 2) stap -e 'process.thread.begin {}' -c CMD catches *all* child threads > of the specific *process* (not task) created by running CMD, *as they > start* To avoid any misunderstandings, let's break this one down: - 'process.thread.begin' means we're interested in *all* threads in the system. - The '-c CMD' (which basically devolves into a '-x PID') means we're *only* interested in PID. [See 'NOTE 1' for more '-c CMD' details.] We resolve this conflict by being interested in all threads started by PID. [Internally, the task_finder first loops through all existing tasks, looking for PID and only attaching a probe to PID that monitors for new threads.] > whereas > > 3) stap -e 'process(NUMBER).thread.begin {}': catches only the thread > with task id equal to NUMBER. > > The behavior of variant 3 is not intuitive at all, to me, given the > behavior of the other two variants, combined with the name of the probe > itself being process(NUMBER), and not task(NUMBER). > > Furthermore, the number in > > process(number).statement(stmtnumber).absolute > process(number).statement(stmenumber).absolute.return > process(number).syscall > process(number).syscall.return > > refers to the process id and these probes all fire in the main thread of > the process with that id and also in its children. i.e. it simply > identifies the target process. So why should the number in not also > refer to the process id? I haven't tested this, but I'd bet you are incorrect there about 'process(PID).syscall' probes. I'll bet they only apply to task id PID, not thread group id PID. ... more stuff deleted ... I'm going to have to ignore the rest of this email for now, or I'll never send this response. NOTE 1: Here's what happens when you use 'stap -c CMD' - stap passes the '-c CMD' argument down to staprun - staprun loads the module and passes the '-c CMD' arg down to stapio - stapio runs the command and eventually sends the pid to the module There ends up being little difference (except a little timing) between "stap -c CMD foo.stp" and "CMD; stap -x PID foo.stp". -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)