public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2] Scheduler Tapset based on kernel tracepoints
@ 2009-09-23 14:13 Kiran
  2009-09-25 20:20 ` Frank Ch. Eigler
  2009-09-25 22:52 ` Josh Stone
  0 siblings, 2 replies; 8+ messages in thread
From: Kiran @ 2009-09-23 14:13 UTC (permalink / raw)
  To: systemtap

Hi,

I have modified the patch according to the comments by Frank and
William.

Changelog
---------
  - Changed "?" to "!" while falling back to the kprobe based probe.
  - Added a level of indirection for the variables.

Signed-off-by: Kiran Prakash <kiran@linux.vnet.ibm.com>

diff -Naur systemtap-orig/tapset/scheduler.stp systemtap/tapset/scheduler.stp
--- systemtap-orig/tapset/scheduler.stp	2009-09-19 10:27:14.000000000 -0400
+++ systemtap/tapset/scheduler.stp	2009-09-22 07:32:04.000000000 -0400
@@ -118,32 +118,280 @@
  *  Arguments:
  *    prev_pid: The pid of the process to be switched out
  *    next_pid: The pid of the process to be switched in
+ *    prev_tid: The tid of the process to be switched out
+ *    next_tid: The tid of the process to be switched in	 
+ *    prev_task_name: The name of the process to be switched out
+ *    next_task_name: The name of the process to be switched in
  *    prevtsk_state: the state of the process to be switched out
+ *    nexttsk_state: the state of the process to be switched in
  */
-probe scheduler.ctxswitch =
+
+probe scheduler.ctxswitch.tp = kernel.trace("sched_switch") 
+{
+	next_pid =  $next->pid
+        next_pid = task_tid($next)
+        next_task = $next
+        next_task_name = task_execname($next)
+        nexttsk_state = $next->state
+	prev_pid = $prev->pid
+        prev_tid = task_tid(tid)
+        prev_task = $prev
+        prev_task_name = task_execname($prev)
+        prevtsk_state = $prev->state
+}
+
+probe scheduler.ctxswitch.kp =
 %( arch != "x86_64" && arch != "ia64" %?
-	kernel.trace("sched_switch") !, kernel.function("__switch_to")
+        kernel.function("__switch_to")
 %:
-	kernel.trace("sched_switch") !, kernel.function("context_switch")
+        kernel.function("context_switch")
 %)
 {
 %( arch == "powerpc" %?
-        prev_pid = $prev->pid
-        next_pid =  $new->pid
+	prev_pid = $prev->pid
+        next_pid = $new->pid
+        prev_tid = task_tid($prev)
+        next_pid = task_tid($new)
         prev_task = $prev
         next_task = $new
+        prev_task_name = task_execname($prev)
+        next_task_name = task_execname($new)
         prevtsk_state = $prev->state
+        nexttsk_state = $new->state
+
 %: %( arch == "x86_64" || arch == "ia64" %?
         prev_pid = $prev->pid
-        next_pid =  $next->pid
+        next_pid = $next->pid
+	prev_tid = task_tid($prev)
+        next_pid = task_tid($next)
         prev_task = $prev
         next_task = $next
+	prev_task_name = task_execname($prev)
+        next_task_name = task_execname($next)
         prevtsk_state = $prev->state
+	nexttsk_state = $next->state
 %:
         prev_pid = $prev_p->pid
         next_pid = $next_p->pid
+	prev_tid = task_tid($prev_p)
+        next_pid = task_tid($next_p)
         prev_task = $prev_p
         next_task = $next_p
+	prev_task_name = task_execname($prev_p)
+        next_task_name = task_execname($next_p)
         prevtsk_state = $prev_p->state
+	nexttsk_state = $next_p->state
 %) %)
 }
+
+probe scheduler.ctxswitch
+ = scheduler.ctxswitch.tp !, scheduler.ctxswitch.kp
+{}
+
+
+/**
+ * probe scheduler.kthread_stop - Fires when a thread created by kthread_create is stopped.
+ * @thread_pid: pid of the thread being stopped.
+ * @thread_priority: priority of the thread.
+ */
+probe scheduler.kthread_stop.kp = kernel.function("kthread_stop")
+{
+	thread_pid = $k->pid
+	thread_priority = $k->priority
+}
+probe scheduler.kthread_stop.tp = kernel.trace("sched_kthread_stop") 
+{
+        thread_pid = $t->pid
+        thread_priority = $t->prio
+}
+probe scheduler.kthread_stop 
+   = scheduler.kthread_stop.tp !,
+     scheduler.kthread_stop.kp
+{}
+
+
+/**
+ * probe scheduler.kthread_stop.return - Fires once the kthread is stopped and gets the return value
+ * @return_value: return value after stopping the thread.
+ */
+
+probe scheduler.kthread_stop.return.kp = kernel.function("kthread_stop").return
+{
+	return_value = $k->exit_code
+}
+probe scheduler.kthread_stop.return.tp = kernel.trace("sched_kthread_stop_ret")
+{
+        return_value = $ret
+}
+
+probe scheduler.kthread_stop.return 
+ = scheduler.kthread_stop.return.tp !,
+   scheduler.kthread_stop.return.kp
+{}
+
+/**
+ * probe scheduler.wait_task - Fires when waiting on a task to unschedule. 
+ *                             It waits till the task becomes inactive.
+ * @task_pid: pid of the task the scheduler is waiting on.
+ * @task_priority: priority of the task
+ */
+
+probe scheduler.wait_task
+ = kernel.trace("sched_wait_task") !,
+   kernel.function("wait_task_inactive")
+{
+        task_pid = $p->pid
+        task_priority = $p->prio
+}
+
+/**
+ * probe scheduler.wakeup - Fires when a task is woken up 
+ * @task_pid: pid of the task being woken up
+ * @task_priority: priority of the task being woken up
+ * @success: returns 1 if the wakeup is successful
+ */
+
+probe scheduler.wakeup
+ = kernel.trace("sched_wakeup") !,
+   kernel.function("try_to_wake_up")
+{
+        task_pid = $p->pid
+        task_priority = $p->prio
+        success = $success
+
+}
+
+/**
+ * probe scheduler.wakeup_new - Fires when a newly created task is woken up for the first time
+ * @task_pid: pid of the new task woken up
+ * @task_priority: priority of the new task
+ * @success: returns 1 if the wake-up is successful
+ */
+probe scheduler.wakeup_new
+ = kernel.trace("sched_wakeup_new") !,
+   kernel.function("wake_up_new_task")
+{
+        task_pid = $p->pid
+        task_priority = $p->prio
+}
+
+/**
+ * probe scheduler.migrate_task - Traces the migration of the tasks across cpus by the scheduler.
+ * @pid: pid of the task being migrated.
+ * @priority: priority of the task being migrated.
+ * @original_cpu: the original cpu
+ * @destination_cpu: the destination cpu
+ */
+probe scheduler.migrate_task.kp = kernel.function("set_task_cpu")
+{
+	destination_cpu = $new_cpu
+}
+probe scheduler.migrate_task.tp = kernel.trace("sched_migrate_task")
+{
+	destination_cpu = $dest_cpu
+}
+probe scheduler.migrate_task
+ = scheduler.migrate_task.tp !,
+   scheduler.migrate_task.kp
+{
+	pid = $p->pid
+        priority = $p->prio
+        original_cpu = task_cpu($p)
+}
+/**
+ * probe scheduler.process_free - Traces the process of freeing up of a process
+ * @pid: PID of the process getting freed
+ * @priority: priority of the process getting freed
+ */
+probe scheduler.process_free.kp = kernel.function("delayed_put_task_struct")
+{
+	pid = $tsk->pid
+	priority = $tsk->prio
+}
+probe scheduler.process_free.tp = kernel.trace("sched_process_free")
+{
+        pid = $p->pid
+        priority = $p->prio
+}
+probe scheduler.process_free
+ = scheduler.process_free.tp !,
+   scheduler.process_free.kp
+{}
+
+/**
+ * probe scheduler.process_exit - Fires when a process exits
+ * @pid: pid of the process exiting
+ * @priority: priority of the process exiting
+ */
+probe scheduler.process_exit.kp = kernel.function("do_exit")
+{
+	pid = $tsk->pid
+	priority = $tsk->priority
+}
+probe scheduler.process_exit.tp = kernel.trace("sched_process_exit")
+{
+        pid = $p->pid
+        priority = $p->prio
+}
+
+probe scheduler.process_exit 
+ = scheduler.process_exit.tp !,
+   scheduler.process_exit.kp
+{}
+
+/**
+ * probe scheduler.process_wait - Fires when scheduler waits on a process
+ * @pid: PID of the process scheduler is waiting on
+ */
+probe scheduler.process_wait.kp = kernel.function("do_wait")
+{
+	pid = $wo->wo_pid
+}
+probe scheduler.process_wait.tp = kernel.trace("sched_process_wait")
+{
+        pid = $pid
+}
+probe scheduler.process_wait
+ = scheduler.process_wait.tp !,
+   scheduler.process_wait.kp
+{}
+
+/**
+ * probe scheduler.process_fork - Probes the tracepoint for forking a process
+ * @parent_pid: PID of the parent process
+ * @child_pid: PID of the child process
+ */
+probe scheduler.process_fork.kp = kernel.function("do_fork")
+{
+	parent_pid = $current->pid
+	child_pid = $p->pid
+}
+probe scheduler.process_fork.tp = kernel.trace("sched_process_fork")
+{
+        parent_pid = $parent->pid
+        child_pid = $child->pid
+}
+
+probe scheduler.process_fork
+ = scheduler.process_fork.tp !,
+   scheduler.process_fork.kp
+{}
+/**
+ * probe scheduler.signal_send - Probes the tracepoint for sending a signal
+ * @pid: pid of the process sending signal
+ * @signal_number: signal number
+ */
+probe scheduler.signal_send.kp = kernel.function("__send_signal")
+{
+	pid = $t->pid
+}
+probe scheduler.signal_send.tp = kernel.trace("sched_signal_send")
+{
+        pid = $p->pid
+}
+probe scheduler.signal_send
+ = scheduler.signal_send.tp !,
+   scheduler.signal_send.kp
+{
+	signal_number = $sig
+}
diff -Naur systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp systemtap/testsuite/buildok/scheduler-test-tracepoints.stp
--- systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/buildok/scheduler-test-tracepoints.stp	2009-09-22 02:10:36.000000000 -0400
@@ -0,0 +1,53 @@
+#! stap -up4
+
+//Tests if all probes in the scheduler tapset are resolvable.
+
+probe scheduler.kthread_stop {
+	printf("pid = %d, priority = %d\n", thread_pid, thread_priority);
+}
+
+probe scheduler.kthread_stop.return {
+        printf("return value = %d\n", return_value);
+}
+
+probe scheduler.wait_task {
+        printf("pid = %d, priority = %d\n", task_pid, task_priority);
+}
+
+probe scheduler.wakeup {
+        printf("pid = %d, priority = %d\n, state = %d, cpu = %d, success = %d",task_pid, task_priority, task_state, task_cpu, success);
+}
+
+probe scheduler.wakeup_new {
+        printf("pid = %d, priority = %d, success = %d\n", task_pid, task_priority, success);
+}
+
+probe scheduler.ctxswitch {
+        printf("prev_pid = %d, prev_priority = %d, prev_state = %d, prev_task_name = %s, prev_tid = %d, next_pid = %d, next_priority = %d, next_state = %d, next_task_name = %s, next_tid = %d\n", prev_pid, prev_priority, prev_state, prev_task_name, prev_tid, next_pid, next_priority, next_state, next_task_name, next_tid);
+}
+
+probe scheduler.migrate_task {
+        printf("pid = %d, priority = %d, original cpu = %d destination cpu = %d\n", pid, priority, original_cpu, destination_cpu);
+}
+
+probe scheduler.process_free {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_exit {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_wait {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_fork {
+        printf("parent pid = %d, child pid = %d\n", parent_pid, child_pid);
+}
+
+probe scheduler.signal_send {
+        printf("pid = %d, signal = %d\n", pid, signal);
+}
+
+
diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.meta systemtap/testsuite/systemtap.examples/profiling/sched_switch.meta
--- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.meta	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.meta	2009-09-22 02:18:03.000000000 -0400
@@ -0,0 +1,14 @@
+title: Display the task switches happeningt the scheduler
+name: sched_switch.stp
+version: 1.0
+author: kiran
+keywords: profiling functions
+subsystem: kernel
+status: production
+exit: user-controlled
+output: sorted-list on-exit
+scope: system-wide
+description: The sched_switch.stp script takes two arguments, first argument can be "pid" or "name" to indicate what is being passed as second argument. The script will trace the process based on pid/name and print the scheduler switches happening with the process. If no arguments are passed, it displays all the scheduler switches. This can be used to understand which tasks scheduler the current process being traced, out and when it gets scheduled in again.
+test_check: stap -p4 sched_switch.stp
+test_installcheck: stap  sched_switch.stp -c "sleep 1"
+
diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
--- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-22 02:29:16.000000000 -0400
@@ -0,0 +1,71 @@
+/* This script wokrs similar to ftrace's sched_switch. It displays a list of
+ * processes which get switched in and out of the scheduler. The format of display
+ * is PROCESS_NAME PROCESS_PID CPU TIMESTAMP PID: PRIORITY: PROCESS STATE ->/+
+ *    NEXT_PID : NEXT_PRIORITY: NEXT_STATE NEXT_PROCESS_NAME 
+ * -> indicates that prev process is scheduled out and the next process is 
+ *    scheduled in.
+ * + indicates that prev process has woken up the next process.
+ * The usage is sched_switch.stp <"pid"/"name"> pid/name
+ */
+
+global task_cpu_old[9999]
+global pids[999]
+global processes
+global prev
+
+function state_calc(state) {
+        if(state == 0)
+        status = "R"
+        if(state == 1)
+        status = "S"
+        if(state == 2)
+        status = "D"
+        if(state == 4)
+        status = "T"
+        if(state == 8)
+        status = "T"
+        if(state == 16)
+        status = "Z"
+        if(state == 32)
+        status = "EXIT_DEAD"
+        return status
+}
+probe scheduler.wakeup
+{
+	pids[task_pid]++
+	processes[task_pid] = $p;
+	prev[task_pid] = task_current()
+	
+}
+probe scheduler.ctxswitch
+{
+	tid = next_tid
+	tid1 = prev_tid
+	state = prev_state
+	state1 = next_state
+	
+	%( $# == 2 %?
+	
+	if(@1 == "pid") 
+		if (tid != $2 && tid1 != $2)
+			next
+	if(@1 == "name")
+		if (task_execname(task_current()) != @2 && task_execname($next) != @2)
+               		next 
+	
+	foreach (name in pids-) {
+		if ((@1 == "pid" && (name == $2 || task_pid(prev[name]) == $2)) || 
+		   (@1 == "name" && (task_execname(prev[name]) == @2 || task_execname(processes[name]) == @2)))
+			printf("%s\t\t%d\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
+				task_execname(prev[name]), task_pid(prev[name]), task_cpu(processes[name]), gettimeofday_ns(), 
+				task_pid(prev[name]), task_prio(prev[name]), state_calc(task_state(prev[name])), 
+				task_pid(processes[name]), task_prio(processes[name]), state_calc(task_state(processes[name])), 
+				task_execname(processes[name]))
+	} %)
+
+	old_cpu = task_cpu_old[tid]
+	printf("%s\t\t%d\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",task_execname(task_current()),tid1,
+		old_cpu,gettimeofday_ns(),tid1,task_prio(task_current()),state_calc(state),next_pid,
+		next_prio,state_calc(next_state),next_task_name )
+	task_cpu_old[next_tid] = cpu()
+}


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Scheduler Tapset based on kernel tracepoints
  2009-09-23 14:13 [PATCH v2] Scheduler Tapset based on kernel tracepoints Kiran
@ 2009-09-25 20:20 ` Frank Ch. Eigler
  2009-09-25 22:52 ` Josh Stone
  1 sibling, 0 replies; 8+ messages in thread
From: Frank Ch. Eigler @ 2009-09-25 20:20 UTC (permalink / raw)
  To: Kiran; +Cc: systemtap


Kiran <kiran@linux.vnet.ibm.com> writes:

(Is that your full name, for purposes of being listed in AUTHORS?)


> [...]
> --- systemtap-orig/tapset/scheduler.stp	2009-09-19 10:27:14.000000000 -0400
> +++ systemtap/tapset/scheduler.stp	2009-09-22 07:32:04.000000000 -0400
> @@ -118,32 +118,280 @@
>   *  Arguments:
>   *    prev_pid: The pid of the process to be switched out
>   *    next_pid: The pid of the process to be switched in
> + *    prev_tid: The tid of the process to be switched out
> + *    next_tid: The tid of the process to be switched in	 
> + *    prev_task_name: The name of the process to be switched out
> + *    next_task_name: The name of the process to be switched in
>   *    prevtsk_state: the state of the process to be switched out
> + *    nexttsk_state: the state of the process to be switched in
>   */

Please recast these as kerneldoc-formatted embedded docs that can
be extracted into our man pages.


> -probe scheduler.ctxswitch =
> +
> +probe scheduler.ctxswitch.tp = kernel.trace("sched_switch") 
> +{
> +	next_pid =  $next->pid
> +        next_pid = task_tid($next)
> +        next_task = $next
> +        next_task_name = task_execname($next)
> +        nexttsk_state = $next->state
> +	prev_pid = $prev->pid
> +        prev_tid = task_tid(tid)
> +        prev_task = $prev
> +        prev_task_name = task_execname($prev)
> +        prevtsk_state = $prev->state
> +}

There's a consistent confusion between pid and tid here.
pid is $task->tgid;  tid is $task->pid.  (There is not much
need for using task_tid() function, certainly not in this
asymmetric way.)



> +/**
> + * probe scheduler.kthread_stop - Fires when a thread created by kthread_create is stopped.
> + * @thread_pid: pid of the thread being stopped.
> + * @thread_priority: priority of the thread.
> + */
> +probe scheduler.kthread_stop.kp = kernel.function("kthread_stop")
> +{
> +	thread_pid = $k->pid
> +	thread_priority = $k->priority
> +}
> +probe scheduler.kthread_stop.tp = kernel.trace("sched_kthread_stop") 
> +{
> +        thread_pid = $t->pid
> +        thread_priority = $t->prio
> +}
> +probe scheduler.kthread_stop 
> +   = scheduler.kthread_stop.tp !,
> +     scheduler.kthread_stop.kp
> +{}

This one looks just right!


> diff -Naur systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp systemtap/testsuite/buildok/scheduler-test-tracepoints.stp
> +#! stap -up4
> +
> +//Tests if all probes in the scheduler tapset are resolvable.
> +
> +probe scheduler.kthread_stop {
> +	printf("pid = %d, priority = %d\n", thread_pid, thread_priority);
> +}
> [...]

These look good (if they print out all the variables).


> [...]
> diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
> +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-22 02:29:16.000000000 -0400
> @@ -0,0 +1,71 @@
> +/* This script wokrs similar to ftrace's sched_switch. It displays a list of
> + * processes which get switched in and out of the scheduler. The format of display
> + * is PROCESS_NAME PROCESS_PID CPU TIMESTAMP PID: PRIORITY: PROCESS STATE ->/+
> + *    NEXT_PID : NEXT_PRIORITY: NEXT_STATE NEXT_PROCESS_NAME 
> + * -> indicates that prev process is scheduled out and the next process is 
> + *    scheduled in.
> + * + indicates that prev process has woken up the next process.
> + * The usage is sched_switch.stp <"pid"/"name"> pid/name
> + */
> +
> +global task_cpu_old[9999]
> +global pids[999]
> [...]

Please write down whatever rationale exists for those two numbers.
Perhaps they could stick with defaults after the pid/tid confusion is
cleared up, and if perhaps you add some code to delete entries
belonging to processes as they die.

It'd be good to see a statement of what OS/kernel versions this
was tested on, and to see a sample of its output.

Thank you for your efforts!


- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Scheduler Tapset based on kernel tracepoints
  2009-09-23 14:13 [PATCH v2] Scheduler Tapset based on kernel tracepoints Kiran
  2009-09-25 20:20 ` Frank Ch. Eigler
@ 2009-09-25 22:52 ` Josh Stone
  2009-09-29  8:55   ` [PATCH v3] " Kiran
  1 sibling, 1 reply; 8+ messages in thread
From: Josh Stone @ 2009-09-25 22:52 UTC (permalink / raw)
  To: Kiran; +Cc: systemtap

Hi,

On 09/23/2009 07:13 AM, Kiran wrote:
> +probe scheduler.ctxswitch.tp = kernel.trace("sched_switch") 

These .tp and .kp variants are an internal detail -- please put them
under __scheduler so they are in a more private namespace from what's
exposed to the user.

> +/**
> + * probe scheduler.migrate_task - Traces the migration of the tasks across cpus by the scheduler.
> + * @pid: pid of the task being migrated.
> + * @priority: priority of the task being migrated.
> + * @original_cpu: the original cpu
> + * @destination_cpu: the destination cpu
> + */

There is already a "scheduler.migrate" defined, so please merge with
that instead of creating a new one.

> +/**
> + * probe scheduler.process_exit - Fires when a process exits
> + * @pid: pid of the process exiting
> + * @priority: priority of the process exiting
> + */

This overlaps with the less well-known "kprocess.exit".  I think in this
case we can make the kprocess one point to yours.

> +/**
> + * probe scheduler.process_fork - Probes the tracepoint for forking a process
> + * @parent_pid: PID of the parent process
> + * @child_pid: PID of the child process
> + */

The "kprocess.create" is also similar to this.

> +/**
> + * probe scheduler.signal_send - Probes the tracepoint for sending a signal
> + * @pid: pid of the process sending signal
> + * @signal_number: signal number
> + */

Here we have the whole signal.stp tapset, which includes a "signal.send"
probe point.

It's not easy to decide how to resolve the overlaps in different
tapsets, but please consider it.  At a minimum, they should leverage
each other so they're all using the best available probe points.


> diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
> +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-22 02:29:16.000000000 -0400
> [...]
> +probe scheduler.wakeup
> +{
> +	pids[task_pid]++
> +	processes[task_pid] = $p;
> +	prev[task_pid] = task_current()
> +	
> +}

If this is just so you can print the "+" line of who wokeup whom, why
not print that right away?  Then you don't need those arrays at all.
Saving task_struct pointers is messy business.

> +probe scheduler.ctxswitch
> +{
> +	tid = next_tid
> +	tid1 = prev_tid
> +	state = prev_state
> +	state1 = next_state

Why copy these values?  The prev/next variable names already reflect
more meaning.

> +	
> +	%( $# == 2 %?
> +	
> +	if(@1 == "pid") 
> +		if (tid != $2 && tid1 != $2)
> +			next
> +	if(@1 == "name")
> +		if (task_execname(task_current()) != @2 && task_execname($next) != @2)
> +               		next 

Note that task_execname(task_current()) is the same as simply
execname().  But anyway, your tapset provides prev_task_name and
next_task_name, so just use those.

> +	
> +	foreach (name in pids-) {
> +		if ((@1 == "pid" && (name == $2 || task_pid(prev[name]) == $2)) || 
> +		   (@1 == "name" && (task_execname(prev[name]) == @2 || task_execname(processes[name]) == @2)))
> +			printf("%s\t\t%d\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> +				task_execname(prev[name]), task_pid(prev[name]), task_cpu(processes[name]), gettimeofday_ns(), 
> +				task_pid(prev[name]), task_prio(prev[name]), state_calc(task_state(prev[name])), 
> +				task_pid(processes[name]), task_prio(processes[name]), state_calc(task_state(processes[name])), 
> +				task_execname(processes[name]))
> +	} %)

Hmm, "name" is not a name, which is confusing.  But if you're looking
for the task which woke this one up, wouldn't that be prev[next_pid]?
(That should really be indexed by the tid, actually.)  Or, as I said
above, it may be easier to print this right in the wakeup probe.

> +
> +	old_cpu = task_cpu_old[tid]
> +	printf("%s\t\t%d\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",task_execname(task_current()),tid1,
> +		old_cpu,gettimeofday_ns(),tid1,task_prio(task_current()),state_calc(state),next_pid,
> +		next_prio,state_calc(next_state),next_task_name )
> +	task_cpu_old[next_tid] = cpu()
> +}

I think task_current() is probably always the same as prev_task, but
it's probably best to use the latter.  It would also be helpful if you
spaced this out so we can better see what's being printed.  I can see
two "tid1"s in there, which is a bit weird.

Thanks,

Josh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints
  2009-09-25 22:52 ` Josh Stone
@ 2009-09-29  8:55   ` Kiran
  2009-09-29 20:19     ` Josh Stone
  0 siblings, 1 reply; 8+ messages in thread
From: Kiran @ 2009-09-29  8:55 UTC (permalink / raw)
  To: Josh Stone, Frank Ch. Eigler; +Cc: systemtap

Hi,

I have attached the next version of the patch. This incorporates most of
the comments given by Josh and Frank. This was tested on mainline linux
kernel 2.6.31 on ppc64 and x86_64. I have attached sample output for
ppc64.

Frank,
My full name is Kiran Prakash 

Josh,
I was not able to merge the signal and kprocess tapsets with my
scheduler tapset as my scheduler tapset probes the tracepoints mainly
and in case tracepoints are not present, it probes the kernel functions.
So I can define namespaces only for the variables exported by the
tracepoint.

Changelog
---------
 	- Included the kp and tp probes in __scheduler namespace
        - Merged the probes scheduler.migrate and scheduler.migrate_task
	- Modified the sched_switch example script to print the wakeup 
	  info in the wakeup probe itself. Hence not using any
	  variables.

Signed-off-by: Kiran Prakash <kiran@linux.vnet.ibm.com>

diff -Naur systemtap-orig/tapset/scheduler.stp systemtap/tapset/scheduler.stp
--- systemtap-orig/tapset/scheduler.stp	2009-09-19 10:27:14.000000000 -0400
+++ systemtap/tapset/scheduler.stp	2009-09-28 23:32:59.000000000 -0400
@@ -20,7 +20,7 @@
 %}
 
 
-/* probe scheduler.cpu_off
+/** probe scheduler.cpu_off
  *
  *  Fires when a process is about to stop running on a cpu.
  *
@@ -41,7 +41,7 @@
 }
 
 
-/* probe scheduler.cpu_on
+/** probe scheduler.cpu_on
  *
  *  Fires when a process is beginning execution on a cpu.
  *
@@ -76,26 +76,6 @@
     idle = __is_idle()
 }
 
-
-/* probe scheduler.migrate
- *
- *  Fires whenever a task is moved to a different cpu's runqueue.
- *
- * Context:
- *  Unknown (sometimes migration thread, sometimes cpu_to)
- *
- * Arguments:
- *  task - the process that is being migrated
- *  cpu_from - the cpu that is losing the task
- *  cpu_to - the cpu that is claiming the task
- */
-probe scheduler.migrate = kernel.function("pull_task")? {
-    task = $p
-    cpu_from = task_cpu($p) /*thread_info renamed to stack since 2.6.22*/
-    cpu_to = $this_cpu
-}
-
-
 /* probe scheduler.balance
  *
  *  Fires when a cpu attempts to find more work.
@@ -107,43 +87,314 @@
 probe scheduler.balance = kernel.function("idle_balance")? {}
 
 
-/* probe scheduler.ctxswitch 
- * 
- *  Fires when there is a context switch
-
- *  Currently systemTap can't access arguments of inline
- *  functions. So we choose to probe __switch_to instead
- *  of context_switch()       
-
- *  Arguments:
- *    prev_pid: The pid of the process to be switched out
- *    next_pid: The pid of the process to be switched in
- *    prevtsk_state: the state of the process to be switched out
+/** 
+ * probe scheduler.ctxswitch - Fires when there is a context switch. Currently
+ *			       systemTap can't access arguments of inline 
+ *			       functions. So we choose to probe __switch_to instead 
+ *			       of context_switch()       
+ * @prev_pid: The pid of the process to be switched out
+ * @next_pid: The pid of the process to be switched in
+ * @prev_tid: The tid of the process to be switched out
+ * @next_tid: The tid of the process to be switched in	 
+ * @prev_task_name: The name of the process to be switched out
+ * @next_task_name: The name of the process to be switched in
+ * @prev_priority: The priority of the process to be switched out
+ * @next_priority: The priority of the process to be switched in
+ * @prevtsk_state: the state of the process to be switched out
+ * @nexttsk_state: the state of the process to be switched in
  */
-probe scheduler.ctxswitch =
+
+probe __scheduler.ctxswitch.tp = kernel.trace("sched_switch") 
+{
+	next_pid =  $next->pid
+        next_tid = task_tid($next)
+        next_task = $next
+        next_task_name = task_execname($next)
+        nexttsk_state = $next->state
+	next_priority = $next->prio
+	prev_priority = $prev->prio
+	prev_pid = $prev->pid
+        prev_tid = task_tid($prev)
+        prev_task = $prev
+        prev_task_name = task_execname($prev)
+        prevtsk_state = $prev->state
+}
+
+probe __scheduler.ctxswitch.kp =
 %( arch != "x86_64" && arch != "ia64" %?
-	kernel.trace("sched_switch") !, kernel.function("__switch_to")
+        kernel.function("__switch_to")
 %:
-	kernel.trace("sched_switch") !, kernel.function("context_switch")
+        kernel.function("context_switch")
 %)
 {
 %( arch == "powerpc" %?
-        prev_pid = $prev->pid
-        next_pid =  $new->pid
+	prev_pid = $prev->pid
+        next_pid = $new->pid
+        prev_tid = task_tid($prev)
+        next_tid = task_tid($new)
         prev_task = $prev
         next_task = $new
+	next_priority = $new->prio
+        prev_priority = $prev->prio
+        prev_task_name = task_execname($prev)
+        next_task_name = task_execname($new)
         prevtsk_state = $prev->state
+        nexttsk_state = $new->state
+
 %: %( arch == "x86_64" || arch == "ia64" %?
         prev_pid = $prev->pid
-        next_pid =  $next->pid
+        next_pid = $next->pid
+	prev_tid = task_tid($prev)
+        next_tid = task_tid($next)
         prev_task = $prev
         next_task = $next
+	next_priority = $next->prio
+        prev_priority = $prev->prio
+	prev_task_name = task_execname($prev)
+        next_task_name = task_execname($next)
         prevtsk_state = $prev->state
+	nexttsk_state = $next->state
 %:
         prev_pid = $prev_p->pid
         next_pid = $next_p->pid
+	prev_tid = task_tid($prev_p)
+        next_tid = task_tid($next_p)
         prev_task = $prev_p
         next_task = $next_p
+	next_priority = $next_p->prio
+        prev_priority = $prev_p->prio
+	prev_task_name = task_execname($prev_p)
+        next_task_name = task_execname($next_p)
         prevtsk_state = $prev_p->state
+	nexttsk_state = $next_p->state
 %) %)
 }
+
+probe scheduler.ctxswitch
+ = __scheduler.ctxswitch.tp !, __scheduler.ctxswitch.kp
+{}
+
+
+/**
+ * probe scheduler.kthread_stop - Fires when a thread created by kthread_create is stopped.
+ * @thread_pid: pid of the thread being stopped.
+ * @thread_priority: priority of the thread.
+ */
+probe __scheduler.kthread_stop.kp = kernel.function("kthread_stop")
+{
+	thread_pid = $k->pid
+	thread_priority = $k->priority
+}
+probe __scheduler.kthread_stop.tp = kernel.trace("sched_kthread_stop") 
+{
+        thread_pid = $t->pid
+        thread_priority = $t->prio
+}
+probe scheduler.kthread_stop 
+   = __scheduler.kthread_stop.tp !,
+     __scheduler.kthread_stop.kp
+{}
+
+
+/**
+ * probe scheduler.kthread_stop.return - Fires once the kthread is stopped and gets the return value
+ * @return_value: return value after stopping the thread.
+ */
+
+probe __scheduler.kthread_stop.return.kp = kernel.function("kthread_stop").return
+{
+	return_value = $k->exit_code
+}
+probe __scheduler.kthread_stop.return.tp = kernel.trace("sched_kthread_stop_ret")
+{
+        return_value = $ret
+}
+
+probe scheduler.kthread_stop.return 
+ = __scheduler.kthread_stop.return.tp !,
+   __scheduler.kthread_stop.return.kp
+{}
+
+/**
+ * probe scheduler.wait_task - Fires when waiting on a task to unschedule. 
+ *                             It waits till the task becomes inactive.
+ * @task_pid: pid of the task the scheduler is waiting on.
+ * @task_priority: priority of the task
+ */
+
+probe scheduler.wait_task
+ = kernel.trace("sched_wait_task") !,
+   kernel.function("wait_task_inactive")
+{
+        task_pid = $p->pid
+        task_priority = $p->prio
+}
+
+/**
+ * probe scheduler.wakeup - Fires when a task is woken up 
+ * @task_pid: pid of the task being woken up
+ * @task_priority: priority of the task being woken up
+ * @task_cpu: cpu of the task being woken up
+ * @task_state: state of the task being woken up
+ * @task_tid: tid of the task being woken up
+ */
+
+probe scheduler.wakeup
+ = kernel.trace("sched_wakeup") !,
+   kernel.function("try_to_wake_up")
+{
+	task = $p
+        task_pid = $p->pid
+	task_tid = task_tid($p)
+        task_priority = $p->prio
+	task_cpu = task_cpu($p)
+	task_state = task_state($p)
+}
+
+/**
+ * probe scheduler.wakeup_new - Fires when a newly created task is woken up for the first time
+ * @task_pid: pid of the new task woken up
+ * @task_priority: priority of the new task
+ * @task_tid: tid of the new task woken up
+ * @task_state: state of the task woken up
+ * @task_cpu: cpu of the task woken up
+ */
+probe scheduler.wakeup_new
+ = kernel.trace("sched_wakeup_new") !,
+   kernel.function("wake_up_new_task")
+{
+        task_pid = $p->pid
+        task_priority = $p->prio
+	task_cpu = task_cpu($p)
+        task_state = task_state($p)
+	task = $p
+	task_tid = task_tid($p)
+}
+
+/**
+ * probe scheduler.migrate - Traces the migration of the tasks across cpus by the scheduler.
+ * @task: the process that is being migrated.
+ * @pid: pid of the task being migrated.
+ * @priority: priority of the task being migrated.
+ * @cpu_from: the original cpu
+ * @cpu_to: the destination cpu
+ */
+probe __scheduler.migrate.kp1 = kernel.function("pull_task")
+{
+	cpu_to = $this_cpu
+}
+probe __scheduler.migrate.kp = kernel.function("set_task_cpu")
+{
+	cpu_to = $new_cpu
+}
+probe __scheduler.migrate.tp = kernel.trace("sched_migrate_task")
+{
+	cpu_to = $dest_cpu
+}
+probe scheduler.migrate
+ = __scheduler.migrate.tp !,
+   __scheduler.migrate.kp !,
+   __scheduler.migrate.kp1
+{
+	task = $p
+	pid = $p->pid
+        priority = $p->prio
+        cpu_from = task_cpu($p)
+}
+/**
+ * probe scheduler.process_free - Traces the process of freeing up of a process
+ * @pid: PID of the process getting freed
+ * @priority: priority of the process getting freed
+ */
+probe __scheduler.process_free.kp = kernel.function("delayed_put_task_struct")
+{
+	pid = $tsk->pid
+	priority = $tsk->prio
+}
+probe __scheduler.process_free.tp = kernel.trace("sched_process_free")
+{
+        pid = $p->pid
+        priority = $p->prio
+}
+probe scheduler.process_free
+ = __scheduler.process_free.tp !,
+   __scheduler.process_free.kp
+{}
+
+/**
+ * probe scheduler.process_exit - Fires when a process exits
+ * @pid: pid of the process exiting
+ * @priority: priority of the process exiting
+ */
+probe __scheduler.process_exit.kp = kernel.function("do_exit")
+{
+	pid = $tsk->pid
+	priority = $tsk->priority
+}
+probe __scheduler.process_exit.tp = kernel.trace("sched_process_exit")
+{
+        pid = $p->pid
+        priority = $p->prio
+}
+
+probe scheduler.process_exit 
+ = __scheduler.process_exit.tp !,
+   __scheduler.process_exit.kp
+{}
+
+/**
+ * probe scheduler.process_wait - Fires when scheduler waits on a process
+ * @pid: PID of the process scheduler is waiting on
+ */
+probe __scheduler.process_wait.kp = kernel.function("do_wait")
+{
+	pid = $wo->wo_pid
+}
+probe __scheduler.process_wait.tp = kernel.trace("sched_process_wait")
+{
+        pid = $pid
+}
+probe scheduler.process_wait
+ = __scheduler.process_wait.tp !,
+   __scheduler.process_wait.kp
+{}
+
+/**
+ * probe scheduler.process_fork - Probes the tracepoint for forking a process
+ * @parent_pid: PID of the parent process
+ * @child_pid: PID of the child process
+ */
+probe __scheduler.process_fork.kp = kernel.function("do_fork")
+{
+	parent_pid = $current->pid
+	child_pid = $p->pid
+}
+probe __scheduler.process_fork.tp = kernel.trace("sched_process_fork")
+{
+        parent_pid = $parent->pid
+        child_pid = $child->pid
+}
+
+probe scheduler.process_fork
+ = __scheduler.process_fork.tp !,
+   __scheduler.process_fork.kp
+{}
+/**
+ * probe scheduler.signal_send - Probes the tracepoint for sending a signal
+ * @pid: pid of the process sending signal
+ * @signal_number: signal number
+ */
+probe __scheduler.signal_send.kp = kernel.function("__send_signal")
+{
+	pid = $t->pid
+}
+probe __scheduler.signal_send.tp = kernel.trace("sched_signal_send")
+{
+        pid = $p->pid
+}
+probe scheduler.signal_send
+ = __scheduler.signal_send.tp !,
+   __scheduler.signal_send.kp
+{
+	signal_number = $sig
+}
diff -Naur systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp systemtap/testsuite/buildok/scheduler-test-tracepoints.stp
--- systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/buildok/scheduler-test-tracepoints.stp	2009-09-29 00:15:10.000000000 -0400
@@ -0,0 +1,51 @@
+#! stap -up4
+
+//Tests if all probes in the scheduler tapset are resolvable.
+
+probe scheduler.kthread_stop {
+	printf("pid = %d, priority = %d\n", thread_pid, thread_priority);
+}
+
+probe scheduler.kthread_stop.return {
+        printf("return value = %d\n", return_value);
+}
+
+probe scheduler.wait_task {
+        printf("pid = %d, priority = %d\n", task_pid, task_priority);
+}
+
+probe scheduler.wakeup {
+        printf("pid = %d, priority = %d, state = %d, cpu = %d, tid = %d\n",task_pid, task_priority, task_state, task_cpu, task_tid);
+}
+
+probe scheduler.wakeup_new {
+        printf("pid = %d, priority = %d, state = %d, cpu = %d, tid = %d\n", task_pid, task_priority, task_state, task_cpu, task_tid);
+}
+
+probe scheduler.ctxswitch {
+        printf("prev_pid = %d, prev_priority = %d, prev_state = %d, prev_task_name = %s, prev_tid = %d, next_pid = %d, next_priority = %d, next_state = %d, next_task_name = %s, next_tid = %d\n", prev_pid, prev_priority, prevtsk_state, prev_task_name, prev_tid, next_pid, next_priority, nexttsk_state, next_task_name, next_tid);
+}
+
+probe scheduler.migrate {
+        printf("pid = %d, priority = %d, original cpu = %d destination cpu = %d\n", pid, priority, cpu_from, cpu_to);
+}
+
+probe scheduler.process_free {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_exit {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_wait {
+        printf("pid = %d\n", pid);
+}
+
+probe scheduler.process_fork {
+        printf("parent pid = %d, child pid = %d\n", parent_pid, child_pid);
+}
+
+probe scheduler.signal_send {
+        printf("pid = %d, signal = %d\n", pid, signal_number);
+}
diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.meta systemtap/testsuite/systemtap.examples/profiling/sched_switch.meta
--- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.meta	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.meta	2009-09-22 02:18:03.000000000 -0400
@@ -0,0 +1,14 @@
+title: Display the task switches happeningt the scheduler
+name: sched_switch.stp
+version: 1.0
+author: kiran
+keywords: profiling functions
+subsystem: kernel
+status: production
+exit: user-controlled
+output: sorted-list on-exit
+scope: system-wide
+description: The sched_switch.stp script takes two arguments, first argument can be "pid" or "name" to indicate what is being passed as second argument. The script will trace the process based on pid/name and print the scheduler switches happening with the process. If no arguments are passed, it displays all the scheduler switches. This can be used to understand which tasks scheduler the current process being traced, out and when it gets scheduled in again.
+test_check: stap -p4 sched_switch.stp
+test_installcheck: stap  sched_switch.stp -c "sleep 1"
+
diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
--- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-29 00:27:58.000000000 -0400
@@ -0,0 +1,68 @@
+/* This script wokrs similar to ftrace's sched_switch. It displays a list of
+ * processes which get switched in and out of the scheduler. The format of display
+ * is PROCESS_NAME PROCESS_PID CPU TIMESTAMP PID: PRIORITY: PROCESS STATE ->/+
+ *    NEXT_PID : NEXT_PRIORITY: NEXT_STATE NEXT_PROCESS_NAME 
+ * -> indicates that prev process is scheduled out and the next process is 
+ *    scheduled in.
+ * + indicates that prev process has woken up the next process.
+ * The usage is sched_switch.stp <"pid"/"name"> pid/name
+ */
+
+function state_calc(state) {
+        if(state == 0)
+        status = "R"
+        if(state == 1)
+        status = "S"
+        if(state == 2)
+        status = "D"
+        if(state == 4)
+        status = "T"
+        if(state == 8)
+        status = "T"
+        if(state == 16)
+        status = "Z"
+        if(state == 32)
+        status = "EXIT_DEAD"
+        return status
+}
+probe scheduler.wakeup
+{
+	 %( $# == 2 %?
+
+        if(@1 == "pid")
+                if (task_tid != $2 && task_tid(task_current()) != $2)
+                        next
+        if(@1 == "name")
+                if (task_execname(task) != @2 && execname() != @2)
+                        next
+
+        
+                if ((@1 == "pid" && (task_tid == $2 || task_pid(task_current()) == $2)) ||
+                   (@1 == "name" && (task_execname(task_current()) == @2 || task_execname(task) == @2)))
+                        printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
+                                execname(), task_cpu(task), gettimeofday_ns(),
+                                task_tid(task_current()), task_prio(task_current()), state_calc(task_state(task_current())),
+                                task_tid(task), task_prio(task), state_calc(task_state(task)),
+                                task_execname(task))
+         %)
+
+
+	
+}
+probe scheduler.ctxswitch
+{
+	%( $# == 2 %?
+	
+	if(@1 == "pid") 
+		if (next_tid != $2 && prev_tid != $2)
+			next
+	if(@1 == "name")
+		if (prev_task_name != @2 && next_task_name != @2)
+               		next 
+	
+	%)
+
+	printf("%s\t\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",prev_task_name,
+		task_cpu(prev_task),gettimeofday_ns(),prev_tid,prev_priority,state_calc(prevtsk_state),next_tid,
+		next_priority,state_calc(nexttsk_state),next_task_name)
+}


Output
------
Output of the sched_switch name kjournald

syslogd         2       1254199017432933355     3395:20:R + 1338:15:S kjournald
syslogd         2       1254199017432952481     3395:120:D ==> 1338:115:R kjournald
kjournald               2       1254199017433168831     1338:115:D ==> 3398:120:R klogd
swapper         2       1254199017445924281     0:40:R + 1338:15:D kjournald
swapper         2       1254199017445949512     0:140:R ==> 1338:115:R kjournald
kjournald               2       1254199017446019858     1338:15:R + 423:15:S kblockd/2
kjournald               2       1254199017446035171     1338:115:D ==> 423:115:R kblockd/2
swapper         2       1254199017451967116     0:40:R + 1338:15:D kjournald
swapper         2       1254199017451989107     0:140:R ==> 1338:115:R kjournald
kjournald               2       1254199017452027795     1338:15:R + 3395:20:D syslogd
kjournald               2       1254199017452042046     1338:115:S ==> 3395:120:R syslogd
syslogd         2       1254199017461421370     3395:20:R + 1338:15:S kjournald
syslogd         2       1254199017461435964     3395:120:D ==> 1338:115:R kjournald
kjournald               2       1254199017461480827     1338:115:D ==> 3398:120:R klogd
swapper         2       1254199017476086266     0:40:R + 1338:15:D kjournald
swapper         2       1254199017476107523     0:140:R ==> 1338:115:R kjournald
kjournald               2       1254199017476137236     1338:115:D ==> 0:140:R swapper
swapper         2       1254199017482149965     0:40:R + 1338:15:D kjournald
-------------------------------------------------------------------------------------------------

Output of the scheduler-test-tracepoints.stp testcase

prev_pid = 29148, prev_priority = 120, prev_state = 1, prev_task_name = stapio, prev_tid = 29148, next_pid = 0, next_priority = 140, next_state = 0, next_task_name = swapper, next_tid = 0
pid = 3398, priority = 120, original cpu = 2 destination cpu = 3
pid = 3398, priority = 120, state = 1, cpu = 3, tid = 3398
prev_pid = 0, prev_priority = 140, prev_state = 0, prev_task_name = swapper, prev_tid = 0, next_pid = 3398, next_priority = 120, next_state = 0, next_task_name = klogd, next_tid = 3398






-- 

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints
  2009-09-29  8:55   ` [PATCH v3] " Kiran
@ 2009-09-29 20:19     ` Josh Stone
  2009-10-01 17:09       ` Kiran
  0 siblings, 1 reply; 8+ messages in thread
From: Josh Stone @ 2009-09-29 20:19 UTC (permalink / raw)
  To: Kiran; +Cc: Frank Ch. Eigler, systemtap

On 09/29/2009 01:54 AM, Kiran wrote:
> Hi,
> 
> I have attached the next version of the patch. This incorporates most of
> the comments given by Josh and Frank. This was tested on mainline linux
> kernel 2.6.31 on ppc64 and x86_64. I have attached sample output for
> ppc64.
> 
> Frank,
> My full name is Kiran Prakash 
> 
> Josh,
> I was not able to merge the signal and kprocess tapsets with my
> scheduler tapset as my scheduler tapset probes the tracepoints mainly
> and in case tracepoints are not present, it probes the kernel functions.
> So I can define namespaces only for the variables exported by the
> tracepoint.

Understood, thanks for looking.

Frank's comment on mixing up pid/tid still needs to be addressed, e.g.:

> +probe scheduler.wakeup
> + = kernel.trace("sched_wakeup") !,
> +   kernel.function("try_to_wake_up")
> +{
> +	task = $p
> +        task_pid = $p->pid
> +	task_tid = task_tid($p)
> +        task_priority = $p->prio
> +	task_cpu = task_cpu($p)
> +	task_state = task_state($p)
> +}

Note that $p->pid is identical to task_tid($p).  What we really want
here and everywhere else is:

  task_pid = $p->tgid
  task_tid = $p->pid

(Also a formatting nitpick -- there's a mix of tabs and spaces in here
that should be more consistent.)

> diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
> +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-29 00:27:58.000000000 -0400
> @@ -0,0 +1,68 @@
> +/* This script wokrs similar to ftrace's sched_switch. It displays a list of

s/wokrs/works/

> +probe scheduler.wakeup
> +{
> +	 %( $# == 2 %?
> +
> +        if(@1 == "pid")
> +                if (task_tid != $2 && task_tid(task_current()) != $2)
> +                        next

task_tid(task_current()) => tid()

However, if the filter is named "pid", shouldn't the comparison be done
by pid instead of tid?

> +        if(@1 == "name")
> +                if (task_execname(task) != @2 && execname() != @2)
> +                        next
> +
> +        
> +                if ((@1 == "pid" && (task_tid == $2 || task_pid(task_current()) == $2)) ||
> +                   (@1 == "name" && (task_execname(task_current()) == @2 || task_execname(task) == @2)))

This condition appears to already be guaranteed by the two filter checks
above.

> +                        printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> +                                execname(), task_cpu(task), gettimeofday_ns(),
> +                                task_tid(task_current()), task_prio(task_current()), state_calc(task_state(task_current())),
> +                                task_tid(task), task_prio(task), state_calc(task_state(task)),
> +                                task_execname(task))
> +         %)

The way you've placed this %) means that nothing will print if no filter
arguments were used.  Is that intentional?  At least, it's not
consistent with the printing in your scheduler.ctxswitch probe...

> +probe scheduler.ctxswitch
> +{
> +	%( $# == 2 %?
> +	
> +	if(@1 == "pid") 
> +		if (next_tid != $2 && prev_tid != $2)
> +			next

This has the same confusion of reading "pid" but comparing tids...

> +	printf("%s\t\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",prev_task_name,
> +		task_cpu(prev_task),gettimeofday_ns(),prev_tid,prev_priority,state_calc(prevtsk_state),next_tid,
> +		next_priority,state_calc(nexttsk_state),next_task_name)
[...]
> swapper         2       1254199017476107523     0:140:R ==> 1338:115:R kjournald
> kjournald               2       1254199017476137236     1338:115:D ==> 0:140:R swapper
> swapper         2       1254199017482149965     0:40:R + 1338:15:D kjournald

Are the tabs significant to you for parsing this output?  Visually, this
would line up better if you used field widths for spacing.
(e.g. %-16s for execnames).

Thanks,

Josh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints
  2009-09-29 20:19     ` Josh Stone
@ 2009-10-01 17:09       ` Kiran
  2009-10-01 17:58         ` Josh Stone
  0 siblings, 1 reply; 8+ messages in thread
From: Kiran @ 2009-10-01 17:09 UTC (permalink / raw)
  To: Josh Stone; +Cc: Frank Ch. Eigler, systemtap

Hi Josh,

Thanks a lot for your comments. I am sending version 4 of the patch
incorporating your comments.

Changelog
---------
	o Changed pid to $p->tgid and tid to $p->pid
	o Made formatting changes
	o Removed the redundant condition in the example script.

Signed-off-by: Kiran Prakash <kiran@linux.vnet.ibm.com>

diff -Naur systemtap-orig/tapset/scheduler.stp systemtap/tapset/scheduler.stp
--- systemtap-orig/tapset/scheduler.stp	2009-09-19 09:27:14.000000000 -0500
+++ systemtap/tapset/scheduler.stp	2009-10-01 11:16:25.000000000 -0500
@@ -20,7 +20,7 @@
 %}
 
 
-/* probe scheduler.cpu_off
+/** probe scheduler.cpu_off
  *
  *  Fires when a process is about to stop running on a cpu.
  *
@@ -41,7 +41,7 @@
 }
 
 
-/* probe scheduler.cpu_on
+/** probe scheduler.cpu_on
  *
  *  Fires when a process is beginning execution on a cpu.
  *
@@ -76,26 +76,6 @@
     idle = __is_idle()
 }
 
-
-/* probe scheduler.migrate
- *
- *  Fires whenever a task is moved to a different cpu's runqueue.
- *
- * Context:
- *  Unknown (sometimes migration thread, sometimes cpu_to)
- *
- * Arguments:
- *  task - the process that is being migrated
- *  cpu_from - the cpu that is losing the task
- *  cpu_to - the cpu that is claiming the task
- */
-probe scheduler.migrate = kernel.function("pull_task")? {
-    task = $p
-    cpu_from = task_cpu($p) /*thread_info renamed to stack since 2.6.22*/
-    cpu_to = $this_cpu
-}
-
-
 /* probe scheduler.balance
  *
  *  Fires when a cpu attempts to find more work.
@@ -107,43 +87,314 @@
 probe scheduler.balance = kernel.function("idle_balance")? {}
 
 
-/* probe scheduler.ctxswitch 
- * 
- *  Fires when there is a context switch
-
- *  Currently systemTap can't access arguments of inline
- *  functions. So we choose to probe __switch_to instead
- *  of context_switch()       
-
- *  Arguments:
- *    prev_pid: The pid of the process to be switched out
- *    next_pid: The pid of the process to be switched in
- *    prevtsk_state: the state of the process to be switched out
+/** 
+ * probe scheduler.ctxswitch - Fires when there is a context switch. Currently
+ *			       systemTap can't access arguments of inline 
+ *			       functions. So we choose to probe __switch_to instead 
+ *			       of context_switch()       
+ * @prev_pid: The pid of the process to be switched out
+ * @next_pid: The pid of the process to be switched in
+ * @prev_tid: The tid of the process to be switched out
+ * @next_tid: The tid of the process to be switched in	 
+ * @prev_task_name: The name of the process to be switched out
+ * @next_task_name: The name of the process to be switched in
+ * @prev_priority: The priority of the process to be switched out
+ * @next_priority: The priority of the process to be switched in
+ * @prevtsk_state: the state of the process to be switched out
+ * @nexttsk_state: the state of the process to be switched in
  */
-probe scheduler.ctxswitch =
+
+probe __scheduler.ctxswitch.tp = kernel.trace("sched_switch") 
+{
+	next_pid =  $next->tgid
+	next_tid = $next->pid
+	next_task = $next
+	next_task_name = task_execname($next)
+	nexttsk_state = $next->state
+	next_priority = $next->prio
+	prev_priority = $prev->prio
+	prev_pid = $prev->tgid
+	prev_tid = $prev->pid
+	prev_task	=	$prev
+	prev_task_name	=	task_execname($prev)
+	prevtsk_state = $prev->state
+}
+
+probe __scheduler.ctxswitch.kp =
 %( arch != "x86_64" && arch != "ia64" %?
-	kernel.trace("sched_switch") !, kernel.function("__switch_to")
+	kernel.function("__switch_to")
 %:
-	kernel.trace("sched_switch") !, kernel.function("context_switch")
+	kernel.function("context_switch")
 %)
 {
 %( arch == "powerpc" %?
-        prev_pid = $prev->pid
-        next_pid =  $new->pid
-        prev_task = $prev
-        next_task = $new
-        prevtsk_state = $prev->state
+	prev_pid = $prev->tgid
+	next_pid = $new->tgid
+	prev_tid = $prev->pid
+	next_tid = $new->pid
+	prev_task = $prev
+	next_task = $new
+	next_priority = $new->prio
+	prev_priority = $prev->prio
+	prev_task_name = task_execname($prev)
+	next_task_name = task_execname($new)
+	prevtsk_state = $prev->state
+	nexttsk_state = $new->state
+
 %: %( arch == "x86_64" || arch == "ia64" %?
-        prev_pid = $prev->pid
-        next_pid =  $next->pid
-        prev_task = $prev
-        next_task = $next
-        prevtsk_state = $prev->state
+	prev_pid = $prev->tgid
+	next_pid = $next->tgid
+	prev_tid = $prev->pid
+	next_tid = $next->pid
+	prev_task = $prev
+	next_task = $next
+	next_priority = $next->prio
+	prev_priority = $prev->prio
+	prev_task_name = task_execname($prev)
+	next_task_name = task_execname($next)
+	prevtsk_state = $prev->state
+	nexttsk_state = $next->state
 %:
-        prev_pid = $prev_p->pid
-        next_pid = $next_p->pid
-        prev_task = $prev_p
-        next_task = $next_p
-        prevtsk_state = $prev_p->state
+	prev_pid = $prev_p->tgid
+	next_pid = $next_p->tgid
+	prev_tid = $prev_p->pid
+	next_tid = $next_p->pid
+	prev_task = $prev_p
+	next_task = $next_p
+	next_priority = $next_p->prio
+	prev_priority = $prev_p->prio
+	prev_task_name = task_execname($prev_p)
+	next_task_name = task_execname($next_p)
+	prevtsk_state = $prev_p->state
+	nexttsk_state = $next_p->state
 %) %)
 }
+
+probe scheduler.ctxswitch
+ = __scheduler.ctxswitch.tp !, __scheduler.ctxswitch.kp
+{}
+
+
+/**
+ * probe scheduler.kthread_stop - Fires when a thread created by kthread_create is stopped.
+ * @thread_pid: pid of the thread being stopped.
+ * @thread_priority: priority of the thread.
+ */
+probe __scheduler.kthread_stop.kp = kernel.function("kthread_stop")
+{
+	thread_pid = $k->tgid
+	thread_priority = $k->priority
+}
+probe __scheduler.kthread_stop.tp = kernel.trace("sched_kthread_stop") 
+{
+	thread_pid = $t->tgid
+	thread_priority = $t->prio
+}
+probe scheduler.kthread_stop 
+   = __scheduler.kthread_stop.tp !,
+     __scheduler.kthread_stop.kp
+{}
+
+
+/**
+ * probe scheduler.kthread_stop.return - Fires once the kthread is stopped and gets the return value
+ * @return_value: return value after stopping the thread.
+ */
+
+probe __scheduler.kthread_stop.return.kp = kernel.function("kthread_stop").return
+{
+	return_value = $k->exit_code
+}
+probe __scheduler.kthread_stop.return.tp = kernel.trace("sched_kthread_stop_ret")
+{
+	return_value = $ret
+}
+
+probe scheduler.kthread_stop.return 
+ = __scheduler.kthread_stop.return.tp !,
+   __scheduler.kthread_stop.return.kp
+{}
+
+/**
+ * probe scheduler.wait_task - Fires when waiting on a task to unschedule. 
+ *                             It waits till the task becomes inactive.
+ * @task_pid: pid of the task the scheduler is waiting on.
+ * @task_priority: priority of the task
+ */
+
+probe scheduler.wait_task
+ = kernel.trace("sched_wait_task") !,
+   kernel.function("wait_task_inactive")
+{
+	task_pid = $p->tgid
+	task_priority = $p->prio
+}
+
+/**
+ * probe scheduler.wakeup - Fires when a task is woken up 
+ * @task_pid: pid of the task being woken up
+ * @task_priority: priority of the task being woken up
+ * @task_cpu: cpu of the task being woken up
+ * @task_state: state of the task being woken up
+ * @task_tid: tid of the task being woken up
+ */
+
+probe scheduler.wakeup
+ = kernel.trace("sched_wakeup") !,
+   kernel.function("try_to_wake_up")
+{
+	task = $p
+	task_pid = $p->tgid
+	task_tid = $p->pid
+	task_priority = $p->prio
+	task_cpu = task_cpu($p)
+	task_state = task_state($p)
+}
+
+/**
+ * probe scheduler.wakeup_new - Fires when a newly created task is woken up for the first time
+ * @task_pid: pid of the new task woken up
+ * @task_priority: priority of the new task
+ * @task_tid: tid of the new task woken up
+ * @task_state: state of the task woken up
+ * @task_cpu: cpu of the task woken up
+ */
+probe scheduler.wakeup_new
+ = kernel.trace("sched_wakeup_new") !,
+   kernel.function("wake_up_new_task")
+{
+	task_pid = $p->tgid
+	task_priority = $p->prio
+	task_cpu = task_cpu($p)
+	task_state = task_state($p)
+	task = $p
+	task_tid = $p->pid
+}
+
+/**
+ * probe scheduler.migrate - Traces the migration of the tasks across cpus by the scheduler.
+ * @task: the process that is being migrated.
+ * @pid: pid of the task being migrated.
+ * @priority: priority of the task being migrated.
+ * @cpu_from: the original cpu
+ * @cpu_to: the destination cpu
+ */
+probe __scheduler.migrate.kp1 = kernel.function("pull_task")
+{
+	cpu_to = $this_cpu
+}
+probe __scheduler.migrate.kp = kernel.function("set_task_cpu")
+{
+	cpu_to = $new_cpu
+}
+probe __scheduler.migrate.tp = kernel.trace("sched_migrate_task")
+{
+	cpu_to = $dest_cpu
+}
+probe scheduler.migrate
+ = __scheduler.migrate.tp !,
+   __scheduler.migrate.kp !,
+   __scheduler.migrate.kp1
+{
+	task = $p
+	pid = $p->tgid
+	priority = $p->prio
+	cpu_from = task_cpu($p)
+}
+/**
+ * probe scheduler.process_free - Traces the process of freeing up of a process
+ * @pid: PID of the process getting freed
+ * @priority: priority of the process getting freed
+ */
+probe __scheduler.process_free.kp = kernel.function("delayed_put_task_struct")
+{
+	pid = $tsk->tgid
+	priority = $tsk->prio
+}
+probe __scheduler.process_free.tp = kernel.trace("sched_process_free")
+{
+	pid = $p->tgid
+	priority = $p->prio
+}
+probe scheduler.process_free
+ = __scheduler.process_free.tp !,
+   __scheduler.process_free.kp
+{}
+
+/**
+ * probe scheduler.process_exit - Fires when a process exits
+ * @pid: pid of the process exiting
+ * @priority: priority of the process exiting
+ */
+probe __scheduler.process_exit.kp = kernel.function("do_exit")
+{
+	pid = $tsk->tgid
+	priority = $tsk->priority
+}
+probe __scheduler.process_exit.tp = kernel.trace("sched_process_exit")
+{
+	pid = $p->tgid
+	priority = $p->prio
+}
+
+probe scheduler.process_exit 
+ = __scheduler.process_exit.tp !,
+   __scheduler.process_exit.kp
+{}
+
+/**
+ * probe scheduler.process_wait - Fires when scheduler waits on a process
+ * @pid: PID of the process scheduler is waiting on
+ */
+probe __scheduler.process_wait.kp = kernel.function("do_wait")
+{
+	pid = $wo->wo_pid
+}
+probe __scheduler.process_wait.tp = kernel.trace("sched_process_wait")
+{
+	pid = $pid
+}
+probe scheduler.process_wait
+ = __scheduler.process_wait.tp !,
+   __scheduler.process_wait.kp
+{}
+
+/**
+ * probe scheduler.process_fork - Probes the tracepoint for forking a process
+ * @parent_pid: PID of the parent process
+ * @child_pid: PID of the child process
+ */
+probe __scheduler.process_fork.kp = kernel.function("do_fork")
+{
+	parent_pid = $current->tgid
+	child_pid = $p->tgid
+}
+probe __scheduler.process_fork.tp = kernel.trace("sched_process_fork")
+{
+	parent_pid = $parent->tgid
+	child_pid = $child->tgid
+}
+
+probe scheduler.process_fork
+ = __scheduler.process_fork.tp !,
+   __scheduler.process_fork.kp
+{}
+/**
+ * probe scheduler.signal_send - Probes the tracepoint for sending a signal
+ * @pid: pid of the process sending signal
+ * @signal_number: signal number
+ */
+probe __scheduler.signal_send.kp = kernel.function("__send_signal")
+{
+	pid = $t->tgid
+}
+probe __scheduler.signal_send.tp = kernel.trace("sched_signal_send")
+{
+	pid = $p->tgid
+}
+probe scheduler.signal_send
+ = __scheduler.signal_send.tp !,
+   __scheduler.signal_send.kp
+{
+	signal_number = $sig
+}
diff -Naur systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp systemtap/testsuite/buildok/scheduler-test-tracepoints.stp
--- systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp	1969-12-31 18:00:00.000000000 -0600
+++ systemtap/testsuite/buildok/scheduler-test-tracepoints.stp	2009-09-28 23:15:10.000000000 -0500
@@ -0,0 +1,51 @@
+#! stap -up4
+
+//Tests if all probes in the scheduler tapset are resolvable.
+
+probe scheduler.kthread_stop {
+	printf("pid = %d, priority = %d\n", thread_pid, thread_priority);
+}
+
+probe scheduler.kthread_stop.return {
+        printf("return value = %d\n", return_value);
+}
+
+probe scheduler.wait_task {
+        printf("pid = %d, priority = %d\n", task_pid, task_priority);
+}
+
+probe scheduler.wakeup {
+        printf("pid = %d, priority = %d\n, state = %d, cpu = %d, tid = %d\n",task_pid, task_priority, task_state, task_cpu, task_tid);
+}
+
+probe scheduler.wakeup_new {
+        printf("pid = %d, priority = %d, state = %d, cpu = %d, tid = %d\n", task_pid, task_priority, task_state, task_cpu, task_tid);
+}
+
+probe scheduler.ctxswitch {
+        printf("prev_pid = %d, prev_priority = %d, prev_state = %d, prev_task_name = %s, prev_tid = %d, next_pid = %d, next_priority = %d, next_state = %d, next_task_name = %s, next_tid = %d\n", prev_pid, prev_priority, prevtsk_state, prev_task_name, prev_tid, next_pid, next_priority, nexttsk_state, next_task_name, next_tid);
+}
+
+probe scheduler.migrate {
+        printf("pid = %d, priority = %d, original cpu = %d destination cpu = %d\n", pid, priority, cpu_from, cpu_to);
+}
+
+probe scheduler.process_free {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_exit {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_wait {
+        printf("pid = %d\n", pid);
+}
+
+probe scheduler.process_fork {
+        printf("parent pid = %d, child pid = %d\n", parent_pid, child_pid);
+}
+
+probe scheduler.signal_send {
+        printf("pid = %d, signal = %d\n", pid, signal_number);
+}
diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.meta systemtap/testsuite/systemtap.examples/profiling/sched_switch.meta
--- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.meta	1969-12-31 18:00:00.000000000 -0600
+++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.meta	2009-09-22 01:18:03.000000000 -0500
@@ -0,0 +1,14 @@
+title: Display the task switches happeningt the scheduler
+name: sched_switch.stp
+version: 1.0
+author: kiran
+keywords: profiling functions
+subsystem: kernel
+status: production
+exit: user-controlled
+output: sorted-list on-exit
+scope: system-wide
+description: The sched_switch.stp script takes two arguments, first argument can be "pid" or "name" to indicate what is being passed as second argument. The script will trace the process based on pid/name and print the scheduler switches happening with the process. If no arguments are passed, it displays all the scheduler switches. This can be used to understand which tasks scheduler the current process being traced, out and when it gets scheduled in again.
+test_check: stap -p4 sched_switch.stp
+test_installcheck: stap  sched_switch.stp -c "sleep 1"
+
diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
--- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 18:00:00.000000000 -0600
+++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-10-01 12:03:04.000000000 -0500
@@ -0,0 +1,69 @@
+/* This script works similar to ftrace's sched_switch. It displays a list of
+ * processes which get switched in and out of the scheduler. The format of display
+ * is PROCESS_NAME PROCESS_PID CPU TIMESTAMP PID: PRIORITY: PROCESS STATE ->/+
+ *    NEXT_PID : NEXT_PRIORITY: NEXT_STATE NEXT_PROCESS_NAME 
+ * -> indicates that prev process is scheduled out and the next process is 
+ *    scheduled in.
+ * + indicates that prev process has woken up the next process.
+ * The usage is sched_switch.stp <"pid"/"name"> pid/name
+ */
+
+function state_calc(state) {
+	if(state == 0)
+	status = "R"
+	if(state == 1)
+	status = "S"
+	if(state == 2)
+	status = "D"
+	if(state == 4)
+	status = "T"
+	if(state == 8)
+	status = "T"
+	if(state == 16)
+	status = "Z"
+	if(state == 32)
+	status = "EXIT_DEAD"
+	return status
+}
+probe scheduler.wakeup
+{
+	 %( $# == 2 %?
+
+	if(@1 == "pid")
+		if (task_pid != $2 && pid() != $2)
+			next
+	if(@1 == "name")
+		if (task_execname(task) != @2 && execname() != @2)
+			next
+
+	printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
+               execname(), task_cpu(task), gettimeofday_ns(),
+               pid(), task_prio(task_current()), state_calc(task_state(task_current())),
+               task_pid(task), task_prio(task), state_calc(task_state(task)),
+               task_execname(task))
+         %)
+	printf("%-16s%5d%5d%d:%d:%s + %d:%d:%s %16s\n",
+                execname(), task_cpu(task), gettimeofday_ns(),
+                pid(), task_prio(task_current()), state_calc(task_state(task_current())),
+                task_pid(task), task_prio(task), state_calc(task_state(task)),
+                task_execname(task))
+}
+probe scheduler.ctxswitch
+{
+	%( $# == 2 %?
+	
+	if(@1 == "pid") 
+		if (next_pid != $2 && prev_pid != $2)
+			next
+	if(@1 == "name")
+		if (prev_task_name != @2 && next_task_name != @2)
+			next 
+	printf("%-16s%5d%5d%d:%d:%s ==> %d:%d:%s %16s\n",prev_task_name,
+                task_cpu(prev_task),gettimeofday_ns(),prev_pid,prev_priority,state_calc(prevtsk_state),next_pid,
+                next_priority,state_calc(nexttsk_state),next_task_name)
+	%)
+
+	printf("%-16s%5d%5d%d:%d:%s ==> %d:%d:%s %16s\n",prev_task_name,
+		task_cpu(prev_task),gettimeofday_ns(),prev_pid,prev_priority,state_calc(prevtsk_state),next_pid,
+		next_priority,state_calc(nexttsk_state),next_task_name)
+}


Thanks,
Kiran

On Tue, 2009-09-29 at 13:18 -0700, Josh Stone wrote:
> On 09/29/2009 01:54 AM, Kiran wrote:
> > Hi,
> > 
> > I have attached the next version of the patch. This incorporates most of
> > the comments given by Josh and Frank. This was tested on mainline linux
> > kernel 2.6.31 on ppc64 and x86_64. I have attached sample output for
> > ppc64.
> > 
> > Frank,
> > My full name is Kiran Prakash 
> > 
> > Josh,
> > I was not able to merge the signal and kprocess tapsets with my
> > scheduler tapset as my scheduler tapset probes the tracepoints mainly
> > and in case tracepoints are not present, it probes the kernel functions.
> > So I can define namespaces only for the variables exported by the
> > tracepoint.
> 
> Understood, thanks for looking.
> 
> Frank's comment on mixing up pid/tid still needs to be addressed, e.g.:
> 
> > +probe scheduler.wakeup
> > + = kernel.trace("sched_wakeup") !,
> > +   kernel.function("try_to_wake_up")
> > +{
> > +	task = $p
> > +        task_pid = $p->pid
> > +	task_tid = task_tid($p)
> > +        task_priority = $p->prio
> > +	task_cpu = task_cpu($p)
> > +	task_state = task_state($p)
> > +}
> 
> Note that $p->pid is identical to task_tid($p).  What we really want
> here and everywhere else is:
> 
>   task_pid = $p->tgid
>   task_tid = $p->pid
> 
> (Also a formatting nitpick -- there's a mix of tabs and spaces in here
> that should be more consistent.)
> 
> > diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> > --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
> > +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-29 00:27:58.000000000 -0400
> > @@ -0,0 +1,68 @@
> > +/* This script wokrs similar to ftrace's sched_switch. It displays a list of
> 
> s/wokrs/works/
> 
> > +probe scheduler.wakeup
> > +{
> > +	 %( $# == 2 %?
> > +
> > +        if(@1 == "pid")
> > +                if (task_tid != $2 && task_tid(task_current()) != $2)
> > +                        next
> 
> task_tid(task_current()) => tid()
> 
> However, if the filter is named "pid", shouldn't the comparison be done
> by pid instead of tid?
> 
> > +        if(@1 == "name")
> > +                if (task_execname(task) != @2 && execname() != @2)
> > +                        next
> > +
> > +        
> > +                if ((@1 == "pid" && (task_tid == $2 || task_pid(task_current()) == $2)) ||
> > +                   (@1 == "name" && (task_execname(task_current()) == @2 || task_execname(task) == @2)))
> 
> This condition appears to already be guaranteed by the two filter checks
> above.
> 
> > +                        printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> > +                                execname(), task_cpu(task), gettimeofday_ns(),
> > +                                task_tid(task_current()), task_prio(task_current()), state_calc(task_state(task_current())),
> > +                                task_tid(task), task_prio(task), state_calc(task_state(task)),
> > +                                task_execname(task))
> > +         %)
> 
> The way you've placed this %) means that nothing will print if no filter
> arguments were used.  Is that intentional?  At least, it's not
> consistent with the printing in your scheduler.ctxswitch probe...
> 
> > +probe scheduler.ctxswitch
> > +{
> > +	%( $# == 2 %?
> > +	
> > +	if(@1 == "pid") 
> > +		if (next_tid != $2 && prev_tid != $2)
> > +			next
> 
> This has the same confusion of reading "pid" but comparing tids...
> 
> > +	printf("%s\t\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",prev_task_name,
> > +		task_cpu(prev_task),gettimeofday_ns(),prev_tid,prev_priority,state_calc(prevtsk_state),next_tid,
> > +		next_priority,state_calc(nexttsk_state),next_task_name)
> [...]
> > swapper         2       1254199017476107523     0:140:R ==> 1338:115:R kjournald
> > kjournald               2       1254199017476137236     1338:115:D ==> 0:140:R swapper
> > swapper         2       1254199017482149965     0:40:R + 1338:15:D kjournald
> 
> Are the tabs significant to you for parsing this output?  Visually, this
> would line up better if you used field widths for spacing.
> (e.g. %-16s for execnames).
> 
> Thanks,
> 
> Josh


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints
  2009-10-01 17:09       ` Kiran
@ 2009-10-01 17:58         ` Josh Stone
  2009-10-02  5:09           ` Kiran
  0 siblings, 1 reply; 8+ messages in thread
From: Josh Stone @ 2009-10-01 17:58 UTC (permalink / raw)
  To: Kiran; +Cc: Frank Ch. Eigler, systemtap

On 10/01/2009 10:09 AM, Kiran wrote:
> Hi Josh,
> 
> Thanks a lot for your comments. I am sending version 4 of the patch
> incorporating your comments.

Hi Kiran,

This looks good, just one more thing:

> +	printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> +               execname(), task_cpu(task), gettimeofday_ns(),
> +               pid(), task_prio(task_current()), state_calc(task_state(task_current())),
> +               task_pid(task), task_prio(task), state_calc(task_state(task)),
> +               task_execname(task))
> +         %)
> +	printf("%-16s%5d%5d%d:%d:%s + %d:%d:%s %16s\n",
> +                execname(), task_cpu(task), gettimeofday_ns(),
> +                pid(), task_prio(task_current()), state_calc(task_state(task_current())),
> +                task_pid(task), task_prio(task), state_calc(task_state(task)),
> +                task_execname(task))
[...]
> +	printf("%-16s%5d%5d%d:%d:%s ==> %d:%d:%s %16s\n",prev_task_name,
> +                task_cpu(prev_task),gettimeofday_ns(),prev_pid,prev_priority,state_calc(prevtsk_state),next_pid,
> +                next_priority,state_calc(nexttsk_state),next_task_name)
> +	%)
> +
> +	printf("%-16s%5d%5d%d:%d:%s ==> %d:%d:%s %16s\n",prev_task_name,
> +		task_cpu(prev_task),gettimeofday_ns(),prev_pid,prev_priority,state_calc(prevtsk_state),next_pid,
> +		next_priority,state_calc(nexttsk_state),next_task_name)
> +}

I'm confused why these printfs are now duplicated; I think only those
outside the '%)' are needed.  If you agree, then I'll just remove the
inner printfs and then commit this.

Thanks,

Josh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints
  2009-10-01 17:58         ` Josh Stone
@ 2009-10-02  5:09           ` Kiran
  0 siblings, 0 replies; 8+ messages in thread
From: Kiran @ 2009-10-02  5:09 UTC (permalink / raw)
  To: Josh Stone; +Cc: Frank Ch. Eigler, systemtap

On Thu, 2009-10-01 at 10:58 -0700, Josh Stone wrote:
> On 10/01/2009 10:09 AM, Kiran wrote:
> > Hi Josh,
> > 
> > Thanks a lot for your comments. I am sending version 4 of the patch
> > incorporating your comments.
> 
> Hi Kiran,
> 
> This looks good, just one more thing:
> 
> > +	printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> > +               execname(), task_cpu(task), gettimeofday_ns(),
> > +               pid(), task_prio(task_current()), state_calc(task_state(task_current())),
> > +               task_pid(task), task_prio(task), state_calc(task_state(task)),
> > +               task_execname(task))
> > +         %)
> > +	printf("%-16s%5d%5d%d:%d:%s + %d:%d:%s %16s\n",
> > +                execname(), task_cpu(task), gettimeofday_ns(),
> > +                pid(), task_prio(task_current()), state_calc(task_state(task_current())),
> > +                task_pid(task), task_prio(task), state_calc(task_state(task)),
> > +                task_execname(task))
> [...]
> > +	printf("%-16s%5d%5d%d:%d:%s ==> %d:%d:%s %16s\n",prev_task_name,
> > +                task_cpu(prev_task),gettimeofday_ns(),prev_pid,prev_priority,state_calc(prevtsk_state),next_pid,
> > +                next_priority,state_calc(nexttsk_state),next_task_name)
> > +	%)
> > +
> > +	printf("%-16s%5d%5d%d:%d:%s ==> %d:%d:%s %16s\n",prev_task_name,
> > +		task_cpu(prev_task),gettimeofday_ns(),prev_pid,prev_priority,state_calc(prevtsk_state),next_pid,
> > +		next_priority,state_calc(nexttsk_state),next_task_name)
> > +}
> 
> I'm confused why these printfs are now duplicated; I think only those
> outside the '%)' are needed.  If you agree, then I'll just remove the
> inner printfs and then commit this.
> 
> Thanks,
> 
> Josh

Hi Josh,
I agree with you. The inner printfs are duplicated.
Could you please remove them? 

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-10-02  5:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-23 14:13 [PATCH v2] Scheduler Tapset based on kernel tracepoints Kiran
2009-09-25 20:20 ` Frank Ch. Eigler
2009-09-25 22:52 ` Josh Stone
2009-09-29  8:55   ` [PATCH v3] " Kiran
2009-09-29 20:19     ` Josh Stone
2009-10-01 17:09       ` Kiran
2009-10-01 17:58         ` Josh Stone
2009-10-02  5:09           ` Kiran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).