From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-18419-listarch-systemtap=sources.redhat.com@sourceware.org>
Received: (qmail 12795 invoked by alias); 21 Sep 2011 19:46:29 -0000
Received: (qmail 12785 invoked by uid 22791); 21 Sep 2011 19:46:28 -0000
X-SWARE-Spam-Status: No, hits=-6.5 required=5.0	tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 21 Sep 2011 19:46:10 +0000
Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p8LJkAX5014235	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)	for <systemtap@sourceware.org>; Wed, 21 Sep 2011 15:46:10 -0400
Received: from t510.usersys.redhat.com (vpn-8-209.rdu.redhat.com [10.11.8.209])	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p8LJk86n026060	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);	Wed, 21 Sep 2011 15:46:09 -0400
Message-ID: <4E7A3F00.8060207@redhat.com>
Date: Wed, 21 Sep 2011 19:46:00 -0000
From: David Smith <dsmith@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20110906 Thunderbird/6.0.2
MIME-Version: 1.0
To: Dave Brolley <brolley@redhat.com>
CC: systemtap@sourceware.org
Subject: Re: [Bug translator/13187] Reconsider the semantics of process(number).thread.begin/end
References: <4E70F10B.2080201@redhat.com>
In-Reply-To: <4E70F10B.2080201@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <systemtap.sourceware.org>
List-Subscribe: <mailto:systemtap-subscribe@sourceware.org>
List-Post: <mailto:systemtap@sourceware.org>
List-Help: <mailto:systemtap-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: systemtap-owner@sourceware.org
X-SW-Source: 2011-q3/txt/msg00373.txt.bz2

On 09/14/2011 01:23 PM, Dave Brolley wrote:

Sorry for the delay in responding to this.  I'm also sorry for the
incomplete response.  But here goes with what I've got.

... stuff deleted ...

Let's start with a discussion of 'PID'.  What does that exactly mean?
You probably already know this, I just want to get our terminology straight.

>From a user's point of view, there are process IDs (returned by
getpid()) and thread IDs (returned by gettid()).  The gettid() man page
states:

    In a single-threaded process, the thread ID is equal to the
    process ID (PID, as returned  by getpid(2)).  In a
    multithreaded process, all threads have the same PID,
    but each one has a unique TID.

Here's where things get confusing.  From the kernel's point of view, the
thread ID is called 'pid' in the task structure and the process ID is
called 'tgid' (thread group id) in the task structure.  (It happened
this way for historical reasons.)

To avoid confusion, I'm going to call an individual thread a 'task' and
the group of tasks with the same 'tgid' value a 'thread group'.

[Here's something I've rediscovered about 'process(PID).*'
probes.  The task_finder only looks for processes in its initial pass.
So, after stap starts up, it looks for all the PID probes, then never
looks for them again.  This was actually done on purpose, since task ids
are non-predictable.  If you ask us to probe a PID, it finishes and
later a completely different process starts and ends up with the same
PID (because PIDs will wrap after a certain number), we didn't want to
accidently process that.  The thought was that if you're probing
specific PIDs, you meant those very tasks, not random ones that might
come along later.]

While researching this, I've discovered a bug in the task_finder.  Where
we're talking about 'process(PID).*' probes, there is actually
some confusion in the task_finder code about what PID means (which
actually relates to your question here).  This command:

  stap -x PID -e 'probe process.* {}'

treats PID differently than this one:

  stap -e 'probe process(PID).* {}'

In the first one, we look for a thread group id value of PID. In the
second case, we look for a task id value of PID.

> I think that there is a lack of orthogonality in the current
> implementation that is confusing. At least it is for me.
> 
> 1) stap -e 'process("PATH").thread.begin {}' catches *all* child threads
> of *processes* (not tasks) identified by PATH, *as they start*


Your "as they start" distinction is not correct.
'process("PATH").begin' and 'process("PATH").thread.begin' probes will
fire when attaching to existing tasks.

So, if you "stap -e 'process("PATH").thread.begin {}", that
.thread.begin probe will fire for all existing threads whose execname is
PATH.  The .thread.begin probe will later fire for all new threads whose
execname is PATH also.

Also note here that the .thread.begin probes fire in the context of the
new thread, not the parent thread.  If your probe body was something like:

    process("PATH").thread.begin {
	printf("pid %d tid %d\n", pid(), tid())
    }

You'll see 2 different numbers there.

[Internally, the task_finder first loops through all existing tasks,
looking for PATH and also attaches a probe to every task in the system,
to help in monitoring new threads.]

> 2) stap -e 'process.thread.begin {}' -c CMD catches *all* child threads
> of the specific *process* (not task) created by running CMD, *as they
> start*


To avoid any misunderstandings, let's break this one down:

- 'process.thread.begin' means we're interested in *all* threads in the
system.

- The '-c CMD' (which basically devolves into a '-x PID') means we're
*only* interested in PID.  [See 'NOTE 1' for more '-c CMD' details.]

We resolve this conflict by being interested in all threads started by PID.

[Internally, the task_finder first loops through all existing tasks,
looking for PID and only attaching a probe to PID that monitors for new
threads.]

> whereas
> 
> 3) stap -e 'process(NUMBER).thread.begin {}': catches only the thread
> with task id equal to NUMBER.
> 
> The behavior of variant 3 is not intuitive at all, to me, given the
> behavior of the other two variants, combined with the name of the probe
> itself being process(NUMBER), and not task(NUMBER).
> 
> Furthermore, the number in
> 
>   process(number).statement(stmtnumber).absolute
>   process(number).statement(stmenumber).absolute.return
>   process(number).syscall
>   process(number).syscall.return
> 
> refers to the process id and these probes all fire in the main thread of
> the process with that id and also in its children. i.e. it simply
> identifies the target process. So why should the number in not also
> refer to the process id?


I haven't tested this, but I'd bet you are incorrect there about
'process(PID).syscall' probes.  I'll bet they only apply to task id PID,
not thread group id PID.

... more stuff deleted ...

I'm going to have to ignore the rest of this email for now, or I'll
never send this response.


NOTE 1: Here's what happens when you use 'stap -c CMD'

- stap passes the '-c CMD' argument down to staprun

- staprun loads the module and passes the '-c CMD' arg down to stapio

- stapio runs the command and eventually sends the pid to the module

There ends up being little difference (except a little timing) between
"stap -c CMD foo.stp" and "CMD; stap -x PID foo.stp".

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)