Using sys_enter sys_exit trace point in place of syscall.*{.return} probes where possible

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* Using sys_enter sys_exit trace point in place of syscall.*{.return} probes where possible
@ 2018-09-18 13:58 William Cohen
  2018-09-19 21:14 ` David Smith
  0 siblings, 1 reply; 5+ messages in thread
From: William Cohen @ 2018-09-18 13:58 UTC (permalink / raw)
  To: systemtap

[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]

Hi,

The syscall tapsets have been under a lot of change lately due to the changes in the linux syscall mechanism.  The syscall tapset updates are still in progress and a number of the examples still fail as a result.  Using the kernel.trace("sys_enter") in place of the syscall.* and kernel.trace("sys_exit") in place of the syscall.*.return can improve this situation.  The WIP attached patch changes 9 FAIL and 8 UNSUPPORTED into PASSES.  It does result in faster compilation of the instrumentation and smaller kernel modules.

There are a couple issues with the patch right now.  The direct use of kernel.trace("sys_enter") and kernel.trace("sys_exit") might not be that clear to readers of the script, but using the raw tracepoints will allow some of the examples to work with older versions of systemtap.  For the kernel.trace("sys_exit") is quite common to need the syscall number or name.  There is an internal function _stp_syscall_nr() that provides that information.  It would be good to have an official user visible function for this value.

What comments and feedback do people have about the patch?

-Will



[-- Attachment #2: 0001-Use-sys_enter-and-sys_exit-tracepoints-in-place-of-s.patch --]
[-- Type: text/x-patch, Size: 8781 bytes --]

From 3984323c39349d45628be70a35b107c910d44fae Mon Sep 17 00:00:00 2001
From: William Cohen <wcohen@redhat.com>
Date: Mon, 17 Sep 2018 16:58:09 -0400
Subject: [PATCH] Use sys_enter and sys_exit tracepoints in place of
 syscall.*{.return}

The common probe point idiom of syscall.* and syscall.*.return can be
replaced with equivalent sys_enter and sys_exit tracepoints for a
number of the example scripts.  The advantages are:

-Quicker compilation of the script into instrumenation
-Smaller kernels modules for the instrumentation
-Lower overhead for probe points

This changes are not applicable to all uses use syscall.* and
syscall.*.return. The predefined variable such as argstr are not
available for the sys_enter and sys_exit trace points.

Some of the revised examples are using the internal _stp_syscall_nr())
function.  A user visible version of this function should be
available.
---
 testsuite/systemtap.examples/general/eventcount.meta     | 2 +-
 testsuite/systemtap.examples/general/stopwatches.stp     | 4 ++--
 testsuite/systemtap.examples/lwtools/syscallbypid-nd.stp | 3 ++-
 testsuite/systemtap.examples/process/syscalls_by_pid.stp | 2 +-
 .../systemtap.examples/process/syscalls_by_proc.stp      | 3 ++-
 testsuite/systemtap.examples/process/syscalltimes        | 9 +++++----
 testsuite/systemtap.examples/process/thread-business.stp | 6 +++---
 .../systemtap.examples/profiling/container_check.stp     | 7 ++++---
 testsuite/systemtap.examples/profiling/errno.stp         | 7 ++++---
 testsuite/systemtap.examples/profiling/topsys.stp        | 3 ++-
 10 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/testsuite/systemtap.examples/general/eventcount.meta b/testsuite/systemtap.examples/general/eventcount.meta
index 200ac5645..6567a3ea5 100644
--- a/testsuite/systemtap.examples/general/eventcount.meta
+++ b/testsuite/systemtap.examples/general/eventcount.meta
@@ -10,4 +10,4 @@ output: batch on-exit
 scope: system-wide
 description: The script periodically prints a count of specified events and their related tid's over the course of execution. Numerous configuration options exist to control filtering / reporting, some of which can be modified at runtime. See the script source for more information.
 test_check: stap -p4 eventcount.stp
-test_installcheck: stap eventcount.stp 'syscall.*' -c 'sleep 3'
+test_installcheck: stap eventcount.stp 'kernel.trace("sys_enter")' -c 'sleep 3'
diff --git a/testsuite/systemtap.examples/general/stopwatches.stp b/testsuite/systemtap.examples/general/stopwatches.stp
index 60275b974..8c28f3a5f 100755
--- a/testsuite/systemtap.examples/general/stopwatches.stp
+++ b/testsuite/systemtap.examples/general/stopwatches.stp
@@ -12,14 +12,14 @@ probe begin
   stop_stopwatch("system")
 }
 
-probe syscall.*
+probe kernel.trace("sys_enter")
 {
   if (pid() != target()) next
   stop_stopwatch("user")
   start_stopwatch("system")
 }
 
-probe syscall.*.return
+probe kernel.trace("sys_exit")
 {
   if (pid() != target()) next
   start_stopwatch("user")
diff --git a/testsuite/systemtap.examples/lwtools/syscallbypid-nd.stp b/testsuite/systemtap.examples/lwtools/syscallbypid-nd.stp
index edd70cddb..4ed64908c 100755
--- a/testsuite/systemtap.examples/lwtools/syscallbypid-nd.stp
+++ b/testsuite/systemtap.examples/lwtools/syscallbypid-nd.stp
@@ -27,8 +27,9 @@ probe begin
 	printf("Tracing syscall completions... Hit Ctrl-C to end.\n");
 }
 
-probe nd_syscall.*.return
+probe kernel.trace("sys_exit")
 {
+	name = syscall_name(_stp_syscall_nr())
 	num[pid(), execname(), name] <<< 1;
 }
 
diff --git a/testsuite/systemtap.examples/process/syscalls_by_pid.stp b/testsuite/systemtap.examples/process/syscalls_by_pid.stp
index 06ed7b727..5cf64be12 100755
--- a/testsuite/systemtap.examples/process/syscalls_by_pid.stp
+++ b/testsuite/systemtap.examples/process/syscalls_by_pid.stp
@@ -17,7 +17,7 @@ probe begin {
   print ("Collecting data... Type Ctrl-C to exit and display results\n")
 }
 
-probe nd_syscall.* {
+probe kernel.trace("sys_enter") {
   syscalls[pid()]++
 }
 
diff --git a/testsuite/systemtap.examples/process/syscalls_by_proc.stp b/testsuite/systemtap.examples/process/syscalls_by_proc.stp
index cf89424eb..dcf480611 100755
--- a/testsuite/systemtap.examples/process/syscalls_by_proc.stp
+++ b/testsuite/systemtap.examples/process/syscalls_by_proc.stp
@@ -1,6 +1,7 @@
 #! /usr/bin/env stap
 
 # Copyright (C) 2006 IBM Corp.
+# Copyright (C) 2018 Red Hat, Inc.
 #
 # This file is part of systemtap, and is free software.  You can
 # redistribute it and/or modify it under the terms of the GNU General
@@ -17,7 +18,7 @@ probe begin {
   print ("Collecting data... Type Ctrl-C to exit and display results\n")
 }
 
-probe nd_syscall.* {
+probe kernel.trace("sys_enter") {
   syscalls[execname()]++
 }
 
diff --git a/testsuite/systemtap.examples/process/syscalltimes b/testsuite/systemtap.examples/process/syscalltimes
index 3aae186ec..08d11ea99 100755
--- a/testsuite/systemtap.examples/process/syscalltimes
+++ b/testsuite/systemtap.examples/process/syscalltimes
@@ -2,7 +2,7 @@
 
 # Syscalltimes systemtap script
 # Copyright (C) 2007 IBM Corp.
-# Copyright (C) 2011 Red Hat, Inc.
+# Copyright (C) 2011,2018 Red Hat, Inc.
 #
 # This file is part of systemtap, and is free software.  You can
 # redistribute it and/or modify it under the terms of the GNU General
@@ -138,11 +138,12 @@ probe begin {
 	}
 }
 
-probe syscall.* {
-	starttime[name, tid()] = gettimeofday_ns()
+probe kernel.trace("sys_enter") {
+	starttime[syscall_name($id), tid()] = gettimeofday_ns()
 }
 
-probe syscall.*.return {
+probe kernel.trace("sys_exit") {
+	name = syscall_name(_stp_syscall_nr())
 	# Skip if we have not seen this before
 	if (!([name, tid()] in starttime)) next
 
diff --git a/testsuite/systemtap.examples/process/thread-business.stp b/testsuite/systemtap.examples/process/thread-business.stp
index 5e258c017..71aa5b8b6 100755
--- a/testsuite/systemtap.examples/process/thread-business.stp
+++ b/testsuite/systemtap.examples/process/thread-business.stp
@@ -6,10 +6,10 @@ global activity2          // [execname,tid]->syscall-name-string-history
 global syscall_history_length = 50 // override with stap -Gsyscall_history_length=NNN
 global top_threads = 20
 
-probe nd_syscall.* # use non-dwarf variant; we don't need context data
+probe kernel.trace("sys_enter") # use syscall tracepoint; we don't need context data
 {
   activity[execname(),tid()]<<<1
-  history = name." ".activity2[execname(),tid()]
+  history = syscall_name($id)." ".activity2[execname(),tid()]
   activity2[execname(),tid()] = substr(history,0,syscall_history_length)
 }
 
@@ -25,4 +25,4 @@ probe timer.s(5)
   printf("\n")
   delete activity
   delete activity2
-}
\ No newline at end of file
+}
diff --git a/testsuite/systemtap.examples/profiling/container_check.stp b/testsuite/systemtap.examples/profiling/container_check.stp
index 51bca8bc4..882b6582c 100755
--- a/testsuite/systemtap.examples/profiling/container_check.stp
+++ b/testsuite/systemtap.examples/profiling/container_check.stp
@@ -146,7 +146,8 @@ probe ns_capable !, capable
     cap_use[tid()] |= cap
 }
 
-probe nd_syscall.*.return {
+probe kernel.trace("sys_exit") {
+  name = syscall_name(_stp_syscall_nr())
   # note any problem capabilities use during syscall
   cap = cap_use[tid()]
   if (cap && child_of_target(task_current())) {
@@ -161,8 +162,8 @@ probe nd_syscall.*.return {
   }
 
   # note any syscalls returning errors
-  if (retval < 0 && child_of_target(task_current())) {
-    syscall_errno[execname(), name, retval] <<< 1
+  if ($ret < 0 && child_of_target(task_current())) {
+    syscall_errno[execname(), name, $ret] <<< 1
   }
 }
 
diff --git a/testsuite/systemtap.examples/profiling/errno.stp b/testsuite/systemtap.examples/profiling/errno.stp
index 599f05d1a..9137e31f6 100755
--- a/testsuite/systemtap.examples/profiling/errno.stp
+++ b/testsuite/systemtap.examples/profiling/errno.stp
@@ -1,6 +1,6 @@
 #! /usr/bin/env stap
 #
-# Copyright (C) 2010 Red Hat, Inc.
+# Copyright (C) 2010, 2018 Red Hat, Inc.
 # By Dominic Duval, Red Hat Inc.
 # dduval@redhat.com
 #
@@ -11,8 +11,9 @@
 
 global execname, errors
 
-probe syscall.*.return {
-  errno = retval
+probe kernel.trace("sys_exit") {
+  name = syscall_name(_stp_syscall_nr())
+  errno = $ret
   if ( errno < 0 ) {
     p = pid()
     execname[p]=execname();
diff --git a/testsuite/systemtap.examples/profiling/topsys.stp b/testsuite/systemtap.examples/profiling/topsys.stp
index 1263e52a2..08bcf0366 100755
--- a/testsuite/systemtap.examples/profiling/topsys.stp
+++ b/testsuite/systemtap.examples/profiling/topsys.stp
@@ -6,7 +6,8 @@
 
 global syscalls_count
 
-probe syscall.* {
+probe kernel.trace("sys_enter") {
+  name = syscall_name($id)
   syscalls_count[name] <<< 1
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using sys_enter sys_exit trace point in place of syscall.*{.return} probes where possible
  2018-09-18 13:58 Using sys_enter sys_exit trace point in place of syscall.*{.return} probes where possible William Cohen
@ 2018-09-19 21:14 ` David Smith
  2018-09-20 15:12   ` William Cohen
  0 siblings, 1 reply; 5+ messages in thread
From: David Smith @ 2018-09-19 21:14 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

In testsuite/systemtap.examples/profiling/container_check.stp, you
used _stp_syscall_nr(). I wouldn't do that, I'd use $id. I'm not 100%
sure that _stp_syscall_nr() is going to work on every arch at that
point.

I also wonder if you shouldn't use the old code as a fallback,
something like the following:

====
probe kernel.trace("sys_exit")!, nd_syscall.*.return {
    # probe that doesn't do anything with the syscall info
}
====

That gets trickier if the probe does something with the syscall info.
On Tue, Sep 18, 2018 at 8:58 AM William Cohen <wcohen@redhat.com> wrote:
>
> Hi,
>
> The syscall tapsets have been under a lot of change lately due to the changes in the linux syscall mechanism.  The syscall tapset updates are still in progress and a number of the examples still fail as a result.  Using the kernel.trace("sys_enter") in place of the syscall.* and kernel.trace("sys_exit") in place of the syscall.*.return can improve this situation.  The WIP attached patch changes 9 FAIL and 8 UNSUPPORTED into PASSES.  It does result in faster compilation of the instrumentation and smaller kernel modules.
>
> There are a couple issues with the patch right now.  The direct use of kernel.trace("sys_enter") and kernel.trace("sys_exit") might not be that clear to readers of the script, but using the raw tracepoints will allow some of the examples to work with older versions of systemtap.  For the kernel.trace("sys_exit") is quite common to need the syscall number or name.  There is an internal function _stp_syscall_nr() that provides that information.  It would be good to have an official user visible function for this value.
>
> What comments and feedback do people have about the patch?
>
> -Will
>
>


-- 
David Smith
Associate Manager
Red Hat

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using sys_enter sys_exit trace point in place of syscall.*{.return} probes where possible
  2018-09-19 21:14 ` David Smith
@ 2018-09-20 15:12   ` William Cohen
  2018-09-20 16:07     ` David Smith
  0 siblings, 1 reply; 5+ messages in thread
From: William Cohen @ 2018-09-20 15:12 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap

On 9/19/18 5:14 PM, David Smith wrote:
> In testsuite/systemtap.examples/profiling/container_check.stp, you
> used _stp_syscall_nr(). I wouldn't do that, I'd use $id. I'm not 100%
> sure that _stp_syscall_nr() is going to work on every arch at that
> point.

Hi David,

Here are the raw tracepoints:

$ stap -L 'kernel.trace("sys_*")'
kernel.trace("raw_syscalls:sys_enter") $regs:struct pt_regs* $id:long int
kernel.trace("raw_syscalls:sys_exit") $regs:struct pt_regs* $ret:long int

It would have been preferable to use $id for the kernel.trace("sys_exit"), but it doesn't exist there.  So it was _stp_syscall_nr() which works on some machines versus $id which doesn't work on any machine.  I spent some time Wednesday changing things to have a tapset encapsulate with syscall_any and syscall_any.return probe points to hide details like the _stp_syscall_nr().

> 
> I also wonder if you shouldn't use the old code as a fallback,
> something like the following:
> 
> ====
> probe kernel.trace("sys_exit")!, nd_syscall.*.return {
>     # probe that doesn't do anything with the syscall info
> }
> ====
> 
> That gets trickier if the probe does something with the syscall info.

I considered using the nd_syscall.* and nd_syscall.*.return as fallbacks if the tracepoints were not available.  However, the sys_enter and sys_exit tracepoints have been available since 2009.  Even the RHEL6 kernel has them. It seemed unlikely that fallbacks on the nd_syscall.* would be needed, so they were omitted.

-Will  

> On Tue, Sep 18, 2018 at 8:58 AM William Cohen <wcohen@redhat.com> wrote:
>>
>> Hi,
>>
>> The syscall tapsets have been under a lot of change lately due to the changes in the linux syscall mechanism.  The syscall tapset updates are still in progress and a number of the examples still fail as a result.  Using the kernel.trace("sys_enter") in place of the syscall.* and kernel.trace("sys_exit") in place of the syscall.*.return can improve this situation.  The WIP attached patch changes 9 FAIL and 8 UNSUPPORTED into PASSES.  It does result in faster compilation of the instrumentation and smaller kernel modules.
>>
>> There are a couple issues with the patch right now.  The direct use of kernel.trace("sys_enter") and kernel.trace("sys_exit") might not be that clear to readers of the script, but using the raw tracepoints will allow some of the examples to work with older versions of systemtap.  For the kernel.trace("sys_exit") is quite common to need the syscall number or name.  There is an internal function _stp_syscall_nr() that provides that information.  It would be good to have an official user visible function for this value.
>>
>> What comments and feedback do people have about the patch?
>>
>> -Will
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using sys_enter sys_exit trace point in place of syscall.*{.return} probes where possible
  2018-09-20 15:12   ` William Cohen
@ 2018-09-20 16:07     ` David Smith
  2018-09-20 17:26       ` William Cohen
  0 siblings, 1 reply; 5+ messages in thread
From: David Smith @ 2018-09-20 16:07 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

On Thu, Sep 20, 2018 at 10:12 AM William Cohen <wcohen@redhat.com> wrote:
>
> On 9/19/18 5:14 PM, David Smith wrote:
> > In testsuite/systemtap.examples/profiling/container_check.stp, you
> > used _stp_syscall_nr(). I wouldn't do that, I'd use $id. I'm not 100%
> > sure that _stp_syscall_nr() is going to work on every arch at that
> > point.
>
> Hi David,
>
> Here are the raw tracepoints:
>
> $ stap -L 'kernel.trace("sys_*")'
> kernel.trace("raw_syscalls:sys_enter") $regs:struct pt_regs* $id:long int
> kernel.trace("raw_syscalls:sys_exit") $regs:struct pt_regs* $ret:long int
>
> It would have been preferable to use $id for the kernel.trace("sys_exit"), but it doesn't exist there.  So it was _stp_syscall_nr() which works on some machines versus $id which doesn't work on any machine.  I spent some time Wednesday changing things to have a tapset encapsulate with syscall_any and syscall_any.return probe points to hide details like the _stp_syscall_nr().

Ah, I didn't realize we were talking about syscall returns needing the
syscall number. I seem to recall that arm64 (and perhaps s390x) had
some restrictions about when the stuff called by _stp_syscall_nr()
could be called. You might try testing on those platforms.

> > I also wonder if you shouldn't use the old code as a fallback,
> > something like the following:
> >
> > ====
> > probe kernel.trace("sys_exit")!, nd_syscall.*.return {
> >     # probe that doesn't do anything with the syscall info
> > }
> > ====
> >
> > That gets trickier if the probe does something with the syscall info.
>
> I considered using the nd_syscall.* and nd_syscall.*.return as fallbacks if the tracepoints were not available.  However, the sys_enter and sys_exit tracepoints have been available since 2009.  Even the RHEL6 kernel has them. It seemed unlikely that fallbacks on the nd_syscall.* would be needed, so they were omitted.

OK, you talked me out of that one then.

-- 
David Smith
Associate Manager
Red Hat

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using sys_enter sys_exit trace point in place of syscall.*{.return} probes where possible
  2018-09-20 16:07     ` David Smith
@ 2018-09-20 17:26       ` William Cohen
  0 siblings, 0 replies; 5+ messages in thread
From: William Cohen @ 2018-09-20 17:26 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap

On 9/20/18 12:07 PM, David Smith wrote:
> On Thu, Sep 20, 2018 at 10:12 AM William Cohen <wcohen@redhat.com> wrote:
>>
>> On 9/19/18 5:14 PM, David Smith wrote:
>>> In testsuite/systemtap.examples/profiling/container_check.stp, you
>>> used _stp_syscall_nr(). I wouldn't do that, I'd use $id. I'm not 100%
>>> sure that _stp_syscall_nr() is going to work on every arch at that
>>> point.
>>
>> Hi David,
>>
>> Here are the raw tracepoints:
>>
>> $ stap -L 'kernel.trace("sys_*")'
>> kernel.trace("raw_syscalls:sys_enter") $regs:struct pt_regs* $id:long int
>> kernel.trace("raw_syscalls:sys_exit") $regs:struct pt_regs* $ret:long int
>>
>> It would have been preferable to use $id for the kernel.trace("sys_exit"), but it doesn't exist there.  So it was _stp_syscall_nr() which works on some machines versus $id which doesn't work on any machine.  I spent some time Wednesday changing things to have a tapset encapsulate with syscall_any and syscall_any.return probe points to hide details like the _stp_syscall_nr().> 
> Ah, I didn't realize we were talking about syscall returns needing the
> syscall number. I seem to recall that arm64 (and perhaps s390x) had
> some restrictions about when the stuff called by _stp_syscall_nr()
> could be called. You might try testing on those platforms.
> 

There are cases where it is nicer/more efficient to make the syscall number/name available for the return only need one probe point rather than two to collect data.  An example of this are testsuite/systemtap.examples/lwtools/syscallbypid-nd.stp and testsuite/systemtap.examples/profiling/errno.stp where just want to know the syscall name/number and the return value.

Thanks for listing the architectures should look at to make sure that things work properly on. I will take a look at those and test on those machine.

>>> I also wonder if you shouldn't use the old code as a fallback,
>>> something like the following:
>>>
>>> ====
>>> probe kernel.trace("sys_exit")!, nd_syscall.*.return {
>>>     # probe that doesn't do anything with the syscall info
>>> }
>>> ====
>>>
>>> That gets trickier if the probe does something with the syscall info.
>>
>> I considered using the nd_syscall.* and nd_syscall.*.return as fallbacks if the tracepoints were not available.  However, the sys_enter and sys_exit tracepoints have been available since 2009.  Even the RHEL6 kernel has them. It seemed unlikely that fallbacks on the nd_syscall.* would be needed, so they were omitted.
> 
> OK, you talked me out of that one then.
> 

Well, I had practice. I talked myself out it earlier. :)  I previously had the probe points written just like you suggested.

-Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-20 17:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18 13:58 Using sys_enter sys_exit trace point in place of syscall.*{.return} probes where possible William Cohen
2018-09-19 21:14 ` David Smith
2018-09-20 15:12   ` William Cohen
2018-09-20 16:07     ` David Smith
2018-09-20 17:26       ` William Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).