public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Issues with current systemtap and example scripts
@ 2018-09-07 14:58 William Cohen
  2018-09-10  0:10 ` Frank Ch. Eigler
  0 siblings, 1 reply; 3+ messages in thread
From: William Cohen @ 2018-09-07 14:58 UTC (permalink / raw)
  To: systemtap; +Cc: Jafeer Uddin, Frank Ch. Eigler

Hi,

I have been working to make sure that the various example scripts
function properly.  Here are some things observed when working on the
example scripts.

Fallbacks to expensive raw sys_entry/sys_exit tracepoints

If other lower cost probe points for syscalls are not found, the
tapsets resort to instrumenting the raw sys_entry or sys_exit
tracepoints with gating to handle those particular syscalls.  The down
side of this approach is sys_entry and sys_exit tracepoints are
encountered for every syscall on the system.  These handlers are going
to be called very frequently.  Looking at "perf list" there are
individual syscalls:sys_entry_* and syscalls:sys_exit_* tracepoints
that might be more efficent to use
(http://www.brendangregg.com/blog/2014-07-03/perf-counting.html) but
systemtap does not use these at the moment.

This issue is visible on machines with older kernel such as RHEL7 and
the tapset instrument the raw sys_entry/sys_exit tracepoints for each
unimplmented syscall.  As seen below from a x86_64 RHEL7 machine:

$ stap -L 'syscall.*.return' |grep tracepoint
syscall.bpf.return __tracepoint_arg_regs:long __tracepoint_arg_ret:long __nr:long name:string retstr:string $regs:struct pt_regs* $ret:long int
syscall.compat_execveat.return __tracepoint_arg_regs:long __tracepoint_arg_ret:long __nr:long name:string retstr:string $regs:struct pt_regs* $ret:long int
syscall.execveat.return __tracepoint_arg_regs:long __tracepoint_arg_ret:long name:string retstr:string $regs:struct pt_regs* $ret:long int
syscall.membarrier.return __tracepoint_arg_regs:long __tracepoint_arg_ret:long __nr:long name:string retstr:string $regs:struct pt_regs* $ret:long int
syscall.mlock2.return __tracepoint_arg_regs:long __tracepoint_arg_ret:long __nr:long name:string retstr:string $regs:struct pt_regs* $ret:long int

Also get some false triggering of error error handling as seen below in two similar scripts running on x86_64 rhel7:

[wcohen@paketa systemtap]$ stap -e 'probe tp_syscall.open.return{ printf("tp %x %x %d\n", __tracepoint_arg_regs, $regs, _stp_syscall_nr()); exit() }'
tp ffff936b67613f58 ffff936b67613f58 2
[wcohen@paketa systemtap]$ stap -e 'probe tp_syscall.open.return{ printf("tp %x %x %d %d\n", __tracepoint_arg_regs, $regs, _stp_syscall_nr(), returnval()); exit() }'
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
ERROR: returnval() not defined in this context
tp ffff936a18bdbf58 ffff936a18bdbf58 2 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]


Newly added syscalls in recent kernels might be missing

Generating the syscalls tapsets is a manual process.  Occassionally
new syscalls are added to the kernel.  A common idiom seen in
systemtap scripts is syscall.* to monitor all the syscalls.  The
systemtap scripts will miss those newly added kernel syscalls if there
are not entries in the tapsets.  There should be a script that
examines the output of "ausyscall --dump" and the various
tapset/linux/sysc_* and tapset/linux/${arch}/sysc_* files to identify
possible syscalls missing from the tapset.

Below is an outline how to check for possible missing syscalls on
x86_64:

 cd tapset
 ausyscall --dump|awk '{print $2}'|sort > ksyscalls
 find -type f|egrep -E "\.(/x86_64)?/sysc_([_0-9a-zA-Z]+)"| sed "s/x86_64\///g"| sed "s/\.\/sysc_//g" |sed "s/\.stp//g" |sort > ssyscalls
 diff -y ssyscalls ksyscalls


Strive for context variable consistency between the syscall implementations

For example would like to have returnval() function working for
nd_syscall.*.return, tp_syscall.*.return, and dw_syscall.*.return,
since syscall.*.return may use any one of those implementations.

The following return_compares.stp script running on x86_64 RHEL7
illustrates the problem with the returnval().  The returnval() was
modified for accessing the pt_regs available from tracepoints, but
that doe not seem to be always working as seen above.

Several of the example scripts make use of the side effect that kprobe
return probes fire when the function is entered allowing target
variables to be stored on entry and then referenced in the actual
return probe.  Probe points implement with tracepoints don't have this
property.  The examples will need to use more expensive associative
arrays operations to store and later retreive the value.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Issues with current systemtap and example scripts
  2018-09-07 14:58 Issues with current systemtap and example scripts William Cohen
@ 2018-09-10  0:10 ` Frank Ch. Eigler
  2018-09-10 17:46   ` Frank Ch. Eigler
  0 siblings, 1 reply; 3+ messages in thread
From: Frank Ch. Eigler @ 2018-09-10  0:10 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap, Jafeer Uddin

Hi -


> Looking at "perf list" there are individual syscalls:sys_entry_* and
> syscalls:sys_exit_* tracepoints that might be more efficent to use
> [...]

No, they aren't what they appear.  They are simply demultiplexed
(after-the-fact $id filtered) virtual tracepoints built upon the
same two enter/exit pair we use.


> [...]
> [wcohen@paketa systemtap]$ stap -e 'probe tp_syscall.open.return{ printf("tp %x %x %d %d\n", __tracepoint_arg_regs, $regs, _stp_syscall_nr(), returnval()); exit() }'
> ERROR: returnval() not defined in this context
> [...]

This is improved in error.stp.


> Newly added syscalls in recent kernels might be missing
> 
> Generating the syscalls tapsets is a manual process.  Occassionally
> new syscalls are added to the kernel.  [...]
>  cd tapset
>  ausyscall --dump|awk '{print $2}'|sort > ksyscalls
>  find -type f|egrep -E "\.(/x86_64)?/sysc_([_0-9a-zA-Z]+)"| sed "s/x86_64\///g"| sed "s/\.\/sysc_//g" |sed "s/\.stp//g" |sort > ssyscalls
>  diff -y ssyscalls ksyscalls

This sort of thing could go under scripts/.  See also dump-syscalls.sh
and tracepoint-diff there.


> Strive for context variable consistency between the syscall implementations
> 
> For example would like to have returnval() function working for
> nd_syscall.*.return, tp_syscall.*.return, and dw_syscall.*.return,
> since syscall.*.return may use any one of those implementations.
> [...]

More ideally, the tapset .return aliases could supply a variable like
retval, just as they supply retstr.


- FChE

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Issues with current systemtap and example scripts
  2018-09-10  0:10 ` Frank Ch. Eigler
@ 2018-09-10 17:46   ` Frank Ch. Eigler
  0 siblings, 0 replies; 3+ messages in thread
From: Frank Ch. Eigler @ 2018-09-10 17:46 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap, Jafeer Uddin


>> [wcohen@paketa systemtap]$ stap -e 'probe tp_syscall.open.return{
>> printf("tp %x %x %d %d\n", __tracepoint_arg_regs, $regs,
>> _stp_syscall_nr(), returnval()); exit() }'
>> ERROR: returnval() not defined in this context
>> [...]
>
> This is improved in error.stp.

I spoke too fast; that change is reverted because it hides the error
rather than solving the problem, which is:

- The syscalls tapset does not have a standard -variable- in .return
  probes to pass back the numeric return value.  It has 'retstr' only,
  which is a pretty-printed form.  This was an oversight.

- Because of that oversight, scripts left and right started using the
  returnval() function in errno.stp, which was meant for internal use,
  is documented as imperfect, and normally used from dwarfless probes.

- With 4.17+, the syscalls.* aliases have tp rather than k/u-probes
  alternatives.  Thus no registers, thus a runtime error when
  returnval()/returnstr() are used.  This is what wcohen's tests
  noticed.

So, what to do.

1) Correct the old oversight: add and document a "retval" variable to be
supplied by all the syscall.*.return probe aliases.  Adjust sample
scripts to taste.  Should be a mechanical change to the (gulp) hundreds
of tapset/linux/sysc_* files.

     ...
     retstr = returnstr(1) 
  +  retval = returnval()
     ...
or
     ...
     retstr = return_str(1,$ret)
  +  retval = $ret
     ...
etc.


2) To carry over limping scripts that already broke under 4.17 and used
returnval()/returnstr() in these contexts, maybe also add to all the
sysc_* files:

     ...
     retstr = return_str(1,$ret)
  +  retval = $ret
  +  set_returnval($ret)
     ...
etc., to hide the incoming value in a new reserved field in the CONTEXT
structure.  The returnval()/returnstr() functions would be modified to
look in there.

One problem with this solution is that the set_returnval() function
would be hard to optimize away in case the end-user script doesn't call
returnval().  (This is one of the reasons we prefer to pass such values
via variables - they are optimized away.)  I suggest deprecating this
part of the behaviour in the near future, shifting users toward
retval/retstr.


3) To hide all this boilerplate ret* stuff, suggest adding macros
  @SYSCALL_RETVALSTR($return)     /* for dw aliases */
                    (returnval()) /* for nd aliases */
                    ($ret)        /* for tp aliases */

to a common .stpm file, and using them throughout sysc_* to set
all the ret* values.


- FChE

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-09-10 17:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07 14:58 Issues with current systemtap and example scripts William Cohen
2018-09-10  0:10 ` Frank Ch. Eigler
2018-09-10 17:46   ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).