public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Analysis of broken systemtap example scripts
@ 2018-10-10 15:50 William Cohen
  2018-10-10 21:06 ` William Cohen
  0 siblings, 1 reply; 4+ messages in thread
From: William Cohen @ 2018-10-10 15:50 UTC (permalink / raw)
  To: systemtap, Jafeer Uddin, Frank Ch. Eigler

[-- Attachment #1: Type: text/plain, Size: 2276 bytes --]

Hi,

I am going through the systemtap example scripts and see what things
are still broken and what needs to be done to fix them.

The tapset update due to the changes in the kernel changes in syscall
mechanisms has affected a number of scripts.  socket-events.stp used
internal __syscall.send.return and __syscall.sendmsg.return (marking
those with 'I').  Also a number of scripts assume that the dwarf based
kprobe/kretprobe were being used and had @entry() available for target
variables (marking those with 'E').  Target variable availablility is
questionable since the tapsets often fall to using the non-dwarf
variant which don't provide target variables ('T') or guru mode
changing of varialble ('G')

FAIL: systemtap.examples/general/socket-events build (IE)
FAIL: systemtap.examples/io/iotime build (E)
FAIL: systemtap.examples/process/futexes build (E)
FAIL: systemtap.examples/process/futexes2 build (E)
FAIL: systemtap.examples/process/mutex-contention build (E)
FAIL: systemtap.examples/io/eatmydata build (TG)
FAIL: systemtap.examples/process/noptrace build (TG)

Is there some way to make the dwarf-based probes work on newer 4.18
kernel?  If that was the case the tests above would just work.

For scripts that expect the entry target variable it is possible to
rewrite them to eliminate the @entry() uses.  Attached is a patch that
does that for futexes.stp futexes2.stp, and mutex-contention.stp.  The
main downside is there is more contention for the global associative
array and a lot more skipped probes.


The following test failed because a target variable was no longer
available or couldn't be found.  Some of this is because the function
is now being inline and the parameter just doesn't exist.

FAIL: systemtap.examples/general/varwatch build (inlined function)
FAIL: systemtap.examples/io/capture_ssl_master_secrets build
FAIL: systemtap.examples/network/tcp_trace build (inlined function)

Changes in the kernel internals caused the following tests to fail:
FAIL: systemtap.examples/profiling/periodic build
FAIL: systemtap.examples/process/pfiles build

Perf counters not available on guest vm:
FAIL: systemtap.examples/profiling/perf run

Not sure what is happening with the following test:
FAIL: systemtap.examples/general/badname run


-Will

[-- Attachment #2: mutex_sans_kretprobes.diff --]
[-- Type: text/x-patch, Size: 3492 bytes --]

diff --git a/testsuite/systemtap.examples/process/futexes.stp b/testsuite/systemtap.examples/process/futexes.stp
index 84f000e5f..1a2fb93e4 100755
--- a/testsuite/systemtap.examples/process/futexes.stp
+++ b/testsuite/systemtap.examples/process/futexes.stp
@@ -10,11 +10,22 @@ global FUTEX_CLOCK_REALTIME = 256 /* linux 2.6.29+ */
 global lock_waits # long-lived stats on (tid,lock) blockage elapsed time
 global process_names # long-lived pid-to-execname mapping
 
-probe syscall.futex.return {  
-  if (($op & ~(FUTEX_PRIVATE_FLAG|FUTEX_CLOCK_REALTIME)) != FUTEX_WAIT) next
-  process_names[pid()] = execname()
-  elapsed = gettimeofday_us() - @entry(gettimeofday_us())
-  lock_waits[pid(), $uaddr] <<< elapsed
+global entry_times, uaddrs
+
+probe syscall.futex.return {
+  if (!(entry_times[tid()])) next
+  if (!(pid() in process_names))
+    process_names[pid()] = execname()
+  elapsed = gettimeofday_us() - entry_times[tid()]
+  lock_waits[pid(), uaddrs[tid()]] <<< elapsed
+  delete entry_times[tid()]
+  delete uaddrs[tid()]
+}
+
+probe syscall.futex {
+  if ((op & ~(FUTEX_PRIVATE_FLAG|FUTEX_CLOCK_REALTIME)) != FUTEX_WAIT) next
+  entry_times[tid()] = gettimeofday_us()
+  uaddrs[tid()] = futex_uaddr
 }
 
 probe end {
diff --git a/testsuite/systemtap.examples/process/futexes2.stp b/testsuite/systemtap.examples/process/futexes2.stp
index 4c9dd1948..b371139e5 100644
--- a/testsuite/systemtap.examples/process/futexes2.stp
+++ b/testsuite/systemtap.examples/process/futexes2.stp
@@ -35,17 +35,24 @@ probe kernel.function("futex_wait_setup").return {
   }
 }
 
+global entry_times
+
+probe syscall.futex {
+  if ((op & ~(FUTEX_PRIVATE_FLAG|FUTEX_CLOCK_REALTIME)) != FUTEX_WAIT) next
+  entry_times[tid()] = gettimeofday_us()
+}
+
 probe syscall.futex.return {  
   if (!(tid() in wait_keys)) next
   key = wait_keys[tid()]
   delete wait_keys[tid()]
 
-  cmd = @entry($op) & ~(FUTEX_PRIVATE_FLAG|FUTEX_CLOCK_REALTIME);
-  if (cmd != FUTEX_WAIT) next
+  if (!(tid() in entry_times)) next
 
   process_names[pid()] = execname()
-  elapsed = gettimeofday_us() - @entry(gettimeofday_us())
+  elapsed = gettimeofday_us() - entry_times[tid()]
   lock_waits[pid(), key] <<< elapsed
+  delete entry_times[tid()]
 }
 
 probe end {
diff --git a/testsuite/systemtap.examples/process/mutex-contention.stp b/testsuite/systemtap.examples/process/mutex-contention.stp
index 66a67878f..69bd89655 100755
--- a/testsuite/systemtap.examples/process/mutex-contention.stp
+++ b/testsuite/systemtap.examples/process/mutex-contention.stp
@@ -92,14 +92,25 @@ probe process("/lib*/libpthread.so*").function("__pthread_rwlock_init")
     process_mutex_init($rwlock, ppfunc())
 }
 
+global entry_times, uaddrs
+
+probe syscall.futex {
+  if (pid() != target()) next		// skip irrelevant processes
+  if ((op & ~(FUTEX_PRIVATE_FLAG|FUTEX_CLOCK_REALTIME)) != FUTEX_WAIT) next
+  entry_times[tid()] = gettimeofday_us()
+  uaddrs[tid()] = futex_uaddr
+}
+
 probe syscall.futex.return
 {
-  op = @entry($op)
-  uaddr = @entry($uaddr)
-  if ((op & ~(FUTEX_PRIVATE_FLAG|FUTEX_CLOCK_REALTIME)) != FUTEX_WAIT) next
-  if (pid() != target()) next		// skip irrelevant processes
+  if (!(tid() in entry_times)) next
+  entry = entry_times[tid()]
+  delete entry_times[tid()]
+
+  uaddr = uaddrs[tid()]
+  delete uaddrs[tid()]
 
-  elapsed = gettimeofday_us() - @entry(gettimeofday_us())
+  elapsed = gettimeofday_us() - entry
   if (keep_stats) {
     mutex_contention[uaddr] <<< elapsed
     stack = sprint_ubacktrace()

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Analysis of broken systemtap example scripts
  2018-10-10 15:50 Analysis of broken systemtap example scripts William Cohen
@ 2018-10-10 21:06 ` William Cohen
  2018-10-10 21:24   ` Frank Ch. Eigler
  0 siblings, 1 reply; 4+ messages in thread
From: William Cohen @ 2018-10-10 21:06 UTC (permalink / raw)
  To: systemtap, Jafeer Uddin, Frank Ch. Eigler

On 10/10/18 11:50 AM, William Cohen wrote:
> Hi,
> 
> I am going through the systemtap example scripts and see what things
> are still broken and what needs to be done to fix them.
> 
> The tapset update due to the changes in the kernel changes in syscall
> mechanisms has affected a number of scripts.  socket-events.stp used
> internal __syscall.send.return and __syscall.sendmsg.return (marking
> those with 'I').  Also a number of scripts assume that the dwarf based
> kprobe/kretprobe were being used and had @entry() available for target
> variables (marking those with 'E').  Target variable availablility is
> questionable since the tapsets often fall to using the non-dwarf
> variant which don't provide target variables ('T') or guru mode
> changing of varialble ('G')
> 
> FAIL: systemtap.examples/general/socket-events build (IE)
> FAIL: systemtap.examples/io/iotime build (E)
> FAIL: systemtap.examples/process/futexes build (E)
> FAIL: systemtap.examples/process/futexes2 build (E)
> FAIL: systemtap.examples/process/mutex-contention build (E)
> FAIL: systemtap.examples/io/eatmydata build (TG)
> FAIL: systemtap.examples/process/noptrace build (TG)
> 
> Is there some way to make the dwarf-based probes work on newer 4.18
> kernel?  If that was the case the tests above would just work.
> 
> For scripts that expect the entry target variable it is possible to
> rewrite them to eliminate the @entry() uses.  Attached is a patch that
> does that for futexes.stp futexes2.stp, and mutex-contention.stp.  The
> main downside is there is more contention for the global associative
> array and a lot more skipped probes.

I have checked in the change for futexes.stp, futexes2.stp and mutex-contention.stp.  I am looking into a slightly different way to address the issue with iotime.stp.  Trying to use non-dwarf access to arguments in the @entry.  This works on older kernels but not the newer fedora kernel.  I wrote a little script to illustrate the issue.

[wcohen@paketa ~]$ uname -a
Linux paketa 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[wcohen@paketa ~]$ cat open_args.stp 
function proc:string() { return sprintf("%d (%s)", pid(), execname()) }

probe syscall.openat {
  printf("%s %s (%x) %s\n", proc(), pp(), pointer_arg(2), filename)
}

probe syscall.openat.return {
  filename = @entry(user_string_quoted(pointer_arg(2)))
  printf("%s %s (%x) %s\n", proc(), pp(), @entry(pointer_arg(2)), filename)
}
[wcohen@paketa ~]$ stap open_args.stp  -T 10
989 (systemd-journal) kernel.function("SyS_openat@fs/open.c:1071").call? (558a7fa23530) "/run/log/journal/1415a05ffedc445fa4381c4d5fcadcbf"
989 (systemd-journal) kernel.function("SyS_openat@fs/open.c:1071").return? (558a7fa23530) "/run/log/journal/1415a05ffedc445fa4381c4d5fcadcbf"
11897 (setroubleshootd) kernel.function("SyS_openat@fs/open.c:1071").call? (cd58e0) "/usr/lib64/python2.7/site-packages"
11897 (setroubleshootd) kernel.function("SyS_openat@fs/open.c:1071").return? (cd58e0) "/usr/lib64/python2.7/site-packages"
11897 (setroubleshootd) kernel.function("SyS_openat@fs/open.c:1071").call? (cd58e0) "/usr/lib/python2.7/site-packages"
11897 (setroubleshootd) kernel.function("SyS_openat@fs/open.c:1071").return? (cd58e0) "/usr/lib/python2.7/site-packages"
11897 (setroubleshootd) kernel.function("SyS_openat@fs/open.c:1071").call? (e489f0) "//etc/selinux/targeted/policy"
11897 (setroubleshootd) kernel.function("SyS_openat@fs/open.c:1071").return? (e489f0) "//etc/selinux/targeted/policy"
...


However, running the same script on a newer fedora kernel:


[wcohen@cervelo io]$ uname -a
Linux cervelo 4.18.11-200.fc28.x86_64 #1 SMP Sun Sep 30 15:31:40 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[wcohen@cervelo io]$ stap open_args.stp -T 1
1335 (libvirtd) kprobe.function("__x64_sys_openat")? (7f331f4b66e0) "/sys/devices/system/node/node0/meminfo"
1335 (libvirtd) kprobe.function("__x64_sys_openat").return? (ffffb97f445eff58) 0xffffb97f445eff58
1335 (libvirtd) kprobe.function("__x64_sys_openat")? (7f33371bbb37) "/proc/cpuinfo"
1335 (libvirtd) kprobe.function("__x64_sys_openat").return? (ffffb97f445eff58) 0xffffb97f445eff58
1335 (libvirtd) kprobe.function("__x64_sys_openat")? (7f3318028f10) "/sys/devices/system/cpu/present"
1335 (libvirtd) kprobe.function("__x64_sys_openat").return? (ffffb97f445eff58) 0xffffb97f445eff58
1335 (libvirtd) kprobe.function("__x64_sys_openat")? (7f331801a170) "/sys/devices/system/cpu/online"
1335 (libvirtd) kprobe.function("__x64_sys_openat").return? (ffffb97f445eff58) 0xffffb97f445eff58
...

The pointers and strings look okay on the entry probe, but are corrupted for in the return probe even when they are in an @entry().

-Will

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Analysis of broken systemtap example scripts
  2018-10-10 21:06 ` William Cohen
@ 2018-10-10 21:24   ` Frank Ch. Eigler
  2018-10-10 21:36     ` William Cohen
  0 siblings, 1 reply; 4+ messages in thread
From: Frank Ch. Eigler @ 2018-10-10 21:24 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap, Jafeer Uddin

Hi -

> [....]
> probe syscall.openat.return {
>   filename = @entry(user_string_quoted(pointer_arg(2)))
>   printf("%s %s (%x) %s\n", proc(), pp(), @entry(pointer_arg(2)), filename)
> }

> However, running the same script on a newer fedora kernel:
> [wcohen@cervelo io]$ stap open_args.stp -T 1
> 1335 (libvirtd) kprobe.function("__x64_sys_openat")? (7f331f4b66e0) "/sys/devices/system/node/node0/meminfo"
> [...]

Yes, this is expected, because @entry() expressions are evaluated at
synthetic probes at function.  The syscall.* probes for 4.17+ work by
first processing the incoming pt_regs pointer from a parameter.  The
@entry() part misses this step, so cannot work.

So, both @entry($var) and $var in syscall.*.return are not generally
available any more.  We'll improve @entry() to generalize to this sort
of situation, but in the mean time, user and samples will have to fend
for themselves with explicit globals etc.

- FChE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Analysis of broken systemtap example scripts
  2018-10-10 21:24   ` Frank Ch. Eigler
@ 2018-10-10 21:36     ` William Cohen
  0 siblings, 0 replies; 4+ messages in thread
From: William Cohen @ 2018-10-10 21:36 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap, Jafeer Uddin

On 10/10/18 5:24 PM, Frank Ch. Eigler wrote:
> Hi -
> 
>> [....]
>> probe syscall.openat.return {
>>   filename = @entry(user_string_quoted(pointer_arg(2)))
>>   printf("%s %s (%x) %s\n", proc(), pp(), @entry(pointer_arg(2)), filename)
>> }
> 
>> However, running the same script on a newer fedora kernel:
>> [wcohen@cervelo io]$ stap open_args.stp -T 1
>> 1335 (libvirtd) kprobe.function("__x64_sys_openat")? (7f331f4b66e0) "/sys/devices/system/node/node0/meminfo"
>> [...]
> 
> Yes, this is expected, because @entry() expressions are evaluated at
> synthetic probes at function.  The syscall.* probes for 4.17+ work by
> first processing the incoming pt_regs pointer from a parameter.  The
> @entry() part misses this step, so cannot work.
> 
> So, both @entry($var) and $var in syscall.*.return are not generally
> available any more.  We'll improve @entry() to generalize to this sort
> of situation, but in the mean time, user and samples will have to fend
> for themselves with explicit globals etc.
> 
> - FChE
> 


It doesn't help that the syscall interface functions are special and probably confuses the kretprobes implementation.  With the fedora 28 4.18.11-200 x86_64 kernel end up with:

ffffffff812a44b0 <__x64_sys_openat>:
ffffffff812a44b0:	e8 eb d2 75 00       	callq  ffffffff81a017a0 <__fentry__>
ffffffff812a44b5:	8b 57 60             	mov    0x60(%rdi),%edx
ffffffff812a44b8:	0f b7 4f 38          	movzwl 0x38(%rdi),%ecx
ffffffff812a44bc:	48 8b 77 68          	mov    0x68(%rdi),%rsi
ffffffff812a44c0:	48 8b 7f 70          	mov    0x70(%rdi),%rdi
ffffffff812a44c4:	80 ce 80             	or     $0x80,%dh
ffffffff812a44c7:	e9 94 fd ff ff       	jmpq   ffffffff812a4260 <do_sys_open>
ffffffff812a44cc:	0f 1f 40 00          	nopl   0x0(%rax)

Looks like all the arguments are being pullin from the ptreg struct point to by %rdi and then a jmp to the actual function to handle the syscall. I am still a little confused why the kprobe.function("__x64_sys_openat")? probe got the correct values if the registers have been cleared.

-Will

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-10 21:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-10 15:50 Analysis of broken systemtap example scripts William Cohen
2018-10-10 21:06 ` William Cohen
2018-10-10 21:24   ` Frank Ch. Eigler
2018-10-10 21:36     ` William Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).