From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-13703-listarch-systemtap=sources.redhat.com@sourceware.org>
Received: (qmail 21602 invoked by alias); 25 Sep 2009 22:52:49 -0000
Received: (qmail 21595 invoked by uid 22791); 25 Sep 2009 22:52:48 -0000
X-SWARE-Spam-Status: No, hits=-0.5 required=5.0 	tests=AWL,BAYES_00,J_CHICKENPOX_63,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Sep 2009 22:52:41 +0000
Received: from int-mx05.intmail.prod.int.phx2.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.18]) 	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n8PMqefp005865 	for <systemtap@sources.redhat.com>; Fri, 25 Sep 2009 18:52:40 -0400
Received: from [10.3.224.244] (vpn-224-244.phx2.redhat.com [10.3.224.244]) 	by int-mx05.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id n8PMqdec003562; 	Fri, 25 Sep 2009 18:52:40 -0400
Message-ID: <4ABD49B7.60105@redhat.com>
Date: Fri, 25 Sep 2009 22:52:00 -0000
From: Josh Stone <jistone@redhat.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20090922 Fedora/3.0-2.7.b4.fc11 Lightning/1.0pre Thunderbird/3.0b4
MIME-Version: 1.0
To: Kiran <kiran@linux.vnet.ibm.com>
CC: systemtap@sources.redhat.com
Subject: Re: [PATCH v2] Scheduler Tapset based on kernel tracepoints
References: <1253715217.5193.115.camel@kiran-laptop>
In-Reply-To: <1253715217.5193.115.camel@kiran-laptop>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <systemtap.sourceware.org>
List-Subscribe: <mailto:systemtap-subscribe@sourceware.org>
List-Post: <mailto:systemtap@sourceware.org>
List-Help: <mailto:systemtap-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: systemtap-owner@sourceware.org
X-SW-Source: 2009-q3/txt/msg00854.txt.bz2

Hi,

On 09/23/2009 07:13 AM, Kiran wrote:
> +probe scheduler.ctxswitch.tp = kernel.trace("sched_switch") 

These .tp and .kp variants are an internal detail -- please put them
under __scheduler so they are in a more private namespace from what's
exposed to the user.

> +/**
> + * probe scheduler.migrate_task - Traces the migration of the tasks across cpus by the scheduler.
> + * @pid: pid of the task being migrated.
> + * @priority: priority of the task being migrated.
> + * @original_cpu: the original cpu
> + * @destination_cpu: the destination cpu
> + */

There is already a "scheduler.migrate" defined, so please merge with
that instead of creating a new one.

> +/**
> + * probe scheduler.process_exit - Fires when a process exits
> + * @pid: pid of the process exiting
> + * @priority: priority of the process exiting
> + */

This overlaps with the less well-known "kprocess.exit".  I think in this
case we can make the kprocess one point to yours.

> +/**
> + * probe scheduler.process_fork - Probes the tracepoint for forking a process
> + * @parent_pid: PID of the parent process
> + * @child_pid: PID of the child process
> + */

The "kprocess.create" is also similar to this.

> +/**
> + * probe scheduler.signal_send - Probes the tracepoint for sending a signal
> + * @pid: pid of the process sending signal
> + * @signal_number: signal number
> + */

Here we have the whole signal.stp tapset, which includes a "signal.send"
probe point.

It's not easy to decide how to resolve the overlaps in different
tapsets, but please consider it.  At a minimum, they should leverage
each other so they're all using the best available probe points.


> diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
> +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-22 02:29:16.000000000 -0400
> [...]
> +probe scheduler.wakeup
> +{
> +	pids[task_pid]++
> +	processes[task_pid] = $p;
> +	prev[task_pid] = task_current()
> +	
> +}

If this is just so you can print the "+" line of who wokeup whom, why
not print that right away?  Then you don't need those arrays at all.
Saving task_struct pointers is messy business.

> +probe scheduler.ctxswitch
> +{
> +	tid = next_tid
> +	tid1 = prev_tid
> +	state = prev_state
> +	state1 = next_state

Why copy these values?  The prev/next variable names already reflect
more meaning.

> +	
> +	%( $# == 2 %?
> +	
> +	if(@1 == "pid") 
> +		if (tid != $2 && tid1 != $2)
> +			next
> +	if(@1 == "name")
> +		if (task_execname(task_current()) != @2 && task_execname($next) != @2)
> +               		next 

Note that task_execname(task_current()) is the same as simply
execname().  But anyway, your tapset provides prev_task_name and
next_task_name, so just use those.

> +	
> +	foreach (name in pids-) {
> +		if ((@1 == "pid" && (name == $2 || task_pid(prev[name]) == $2)) || 
> +		   (@1 == "name" && (task_execname(prev[name]) == @2 || task_execname(processes[name]) == @2)))
> +			printf("%s\t\t%d\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> +				task_execname(prev[name]), task_pid(prev[name]), task_cpu(processes[name]), gettimeofday_ns(), 
> +				task_pid(prev[name]), task_prio(prev[name]), state_calc(task_state(prev[name])), 
> +				task_pid(processes[name]), task_prio(processes[name]), state_calc(task_state(processes[name])), 
> +				task_execname(processes[name]))
> +	} %)

Hmm, "name" is not a name, which is confusing.  But if you're looking
for the task which woke this one up, wouldn't that be prev[next_pid]?
(That should really be indexed by the tid, actually.)  Or, as I said
above, it may be easier to print this right in the wakeup probe.

> +
> +	old_cpu = task_cpu_old[tid]
> +	printf("%s\t\t%d\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",task_execname(task_current()),tid1,
> +		old_cpu,gettimeofday_ns(),tid1,task_prio(task_current()),state_calc(state),next_pid,
> +		next_prio,state_calc(next_state),next_task_name )
> +	task_cpu_old[next_tid] = cpu()
> +}

I think task_current() is probably always the same as prev_task, but
it's probably best to use the latter.  It would also be helpful if you
spaced this out so we can better see what's being printed.  I can see
two "tid1"s in there, which is a bit weird.

Thanks,

Josh