public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: Arkady <arkady.miasnikov@gmail.com>
To: Craig Ringer <craig@2ndquadrant.com>
Cc: systemtap <systemtap@sourceware.org>
Subject: Re: Code maintenance / verbosity: macros, enums, and casts
Date: Tue, 21 Jul 2020 07:11:41 +0300	[thread overview]
Message-ID: <CANA-60qrsYgU1x+krFPU_8zWyvPf14vgA_TW=AozwPaz3AStXg@mail.gmail.com> (raw)
In-Reply-To: <CAMsr+YF-1-CXggj9KXjKRphDk2RqP9hNqMSwusk4Y2O50MXNQw@mail.gmail.com>

Craig,

Indeed the STAP script language lacks some (arguably a lot) of useful syntax.
I assume that development of a rich programming language in the kernel
space was never a goal.
I can guess some of the reasons:
* memory management in such language can get tricky.
* we can lose a simple translation from the STAP to C

You probably do not need a modern programming language in the kernel space.
This is what I did. Instead of running business logic in the STAP
script I  add events to the FIFO in a shared memory.
A probe in the script is typically a single liner copying
arguments and return codes to the FIFO.
Most of the business logic runs in the user space.

The main pros of this approach:
* No need to patch STAP (there is one small patch of the begin and end probe)
* Code running in the kernel remains small. It reduces latency in
the system calls I hook.
* I can use any modern programming language. I use Golang, for
example, but I started with C++ and Python.
* Overhead of outputting the STAP trace can be reduced. I use binary
data in the FIFO.
* I use custom hash tables instead of STAP maps
(https://github.com/larytet/lockfree_hashtable). The hashtable
limits applications, but can be faster in some situations.

Arkady.

On Tue, Jul 21, 2020 at 5:55 AM Craig Ringer <craig@2ndquadrant.com> wrote:
>
> Hi all
>
> TL;DR: I'd like to add DWARF-based @enum("foo","my.so") and am interested
> in whether that'd be potentially mergeable + advice on where in the stap
> code I might want to start.
>
> ------
>
> I'm looking for advice from people more experienced with writing and
> maintaining real world systemtap tapsets and tools to manage tapset
> verbosity and make my tapsets reasonably maintainable.
>
> As I write most stap code I'm increasingly finding that I produce a lot of
> boilerplate, especially for member access via a @cast or @var, for mapping
> enums to their long values; for mapping scalar integer macros to their long
> values; for mapping the values of various enums back to their symbolic
> names; and for comparing with C "char" literals.
>
> Constants are definitely the biggest one right now. In addition to their
> verbosity, if they're maintained in tapsets instead of looked up from the
> probed code there's a risk that the tapset's idea of a given enum or macro
> definition may not match what the code is using.
>
> I write a lot of tapset boilerplate like:
>
> @define FOO_BAR %( 0 %)
> @define FOO_BAZ %( 0x63 %) // 'c'
> @define FOO_BAK %( 0x64 %) // 'd'
>
> function foo_str:string(foo:long) {
>     if (foo == @FOO_BAR) return "BAR"
>     if (foo == @FOO_BAZ) return "BAZ"
>     if (foo == @FOO_BAK) return "BAK"
> }
>
> Add @enum("FOO_BAR", "my.so")
> ====
>
> If these are defined in the sources as an enum, it'd be great to look them
> up at stap compile time with a new @enum construct.
>
> I'm wondering how feasible this might be and where I should start looking
> if I want to try implementing it.
>
> Enums are guaranteed to be evaluated to a constant value and they're in the
> ELF .debug_info section. So there's no need to for the executable to have
> -g3 / -ggdb3 debuginfo, the default is fine. And there are no problems with
> evaluating expressions, since there's a constant pre-evaluated by the
> compiler and embedded in the debuginfo.
>
> I can get enum info with
>
>    eu-readelf --debug-dump=info mylib.so
>
> though I'm still trying to understand the elfutils / libdw api well enough
> to actually access it...
>
>
> Why not @const?
> ====
>
> I'm aware of "@const" but there are a number of reasons it doesn't help me:
>
> * It needs guru mode, which is really not desirable on production systems
> (or you need to preinstall a tapset that exposes the constants which then
> defeats the point);
>
> * It doesn't provide a way to specify the header(s) to include to find the
> constant, you have to handle that externally via a tapset or an explicit
> guru-mode #include .
>
> * man stap says "headers are built with default GCC parameters"; this means
> there's no guarantee the values stap sees match what the program was
> compiled with;
>
> * It requires full headers of the probed lib/program to be available at
> probe compile time, *including* transitive dependency headers included by
> the program/lib headers, which is not always desirable or possible;
>
> * The headers must be safely include-able including any/all headers they
> include in turn, any required preprocessor definitions, etc. Some headers
> have specific order-of-inclusion rules, may define or redefine symbols that
> should not be exposed, etc;
>
> * "stap" doesn't appear provide a simple way to specify the include path to
> search for such headers
>
> What about macros?
> ====
>
> Macros are harder.
>
> They're only present if -g3 or -ggdb3 was used at build-time, which is
> still not the norm even though the results are much more compact now than
> they used to be. Most packages don't use dwz to compact their debuginfo
> either.
>
> Even if present in the .debug_macro ELF section, the macro definitions may
> be arbitrary C expressions. They are not guaranteed to be literal integers
> or strings. gdb knows how to evaluate expressions when attached to a
> process, but I don't think it can do so statically. So using macro
> definitions from debuginfo will only work in cases where the macro
> definition is simple. It'd still be really handy.
>
> There's
>
>     eu-readelf --debug-dump=macro some.so
>
> and there's libdw's dwarf_getmacros() etc so the foundations are there.
>
> I'd definitely want to start with @enum first though.
>
> Then only add a @dwarf_macro later, if feasible, and probably restricted to
> simple numeric or string literals.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise

  reply	other threads:[~2020-07-21  4:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  2:54 Craig Ringer
2020-07-21  4:11 ` Arkady [this message]
2020-07-23  5:16 ` Craig Ringer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANA-60qrsYgU1x+krFPU_8zWyvPf14vgA_TW=AozwPaz3AStXg@mail.gmail.com' \
    --to=arkady.miasnikov@gmail.com \
    --cc=craig@2ndquadrant.com \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).