Code maintenance / verbosity: macros, enums, and casts

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* Code maintenance / verbosity: macros, enums, and casts
@ 2020-07-21  2:54 Craig Ringer
  2020-07-21  4:11 ` Arkady
  2020-07-23  5:16 ` Craig Ringer
  0 siblings, 2 replies; 3+ messages in thread
From: Craig Ringer @ 2020-07-21  2:54 UTC (permalink / raw)
  To: systemtap

Hi all

TL;DR: I'd like to add DWARF-based @enum("foo","my.so") and am interested
in whether that'd be potentially mergeable + advice on where in the stap
code I might want to start.

------

I'm looking for advice from people more experienced with writing and
maintaining real world systemtap tapsets and tools to manage tapset
verbosity and make my tapsets reasonably maintainable.

As I write most stap code I'm increasingly finding that I produce a lot of
boilerplate, especially for member access via a @cast or @var, for mapping
enums to their long values; for mapping scalar integer macros to their long
values; for mapping the values of various enums back to their symbolic
names; and for comparing with C "char" literals.

Constants are definitely the biggest one right now. In addition to their
verbosity, if they're maintained in tapsets instead of looked up from the
probed code there's a risk that the tapset's idea of a given enum or macro
definition may not match what the code is using.

I write a lot of tapset boilerplate like:

@define FOO_BAR %( 0 %)
@define FOO_BAZ %( 0x63 %) // 'c'
@define FOO_BAK %( 0x64 %) // 'd'

function foo_str:string(foo:long) {
    if (foo == @FOO_BAR) return "BAR"
    if (foo == @FOO_BAZ) return "BAZ"
    if (foo == @FOO_BAK) return "BAK"
}

Add @enum("FOO_BAR", "my.so")
====

If these are defined in the sources as an enum, it'd be great to look them
up at stap compile time with a new @enum construct.

I'm wondering how feasible this might be and where I should start looking
if I want to try implementing it.

Enums are guaranteed to be evaluated to a constant value and they're in the
ELF .debug_info section. So there's no need to for the executable to have
-g3 / -ggdb3 debuginfo, the default is fine. And there are no problems with
evaluating expressions, since there's a constant pre-evaluated by the
compiler and embedded in the debuginfo.

I can get enum info with

   eu-readelf --debug-dump=info mylib.so

though I'm still trying to understand the elfutils / libdw api well enough
to actually access it...

Why not @const?
====

I'm aware of "@const" but there are a number of reasons it doesn't help me:

* It needs guru mode, which is really not desirable on production systems
(or you need to preinstall a tapset that exposes the constants which then
defeats the point);

* It doesn't provide a way to specify the header(s) to include to find the
constant, you have to handle that externally via a tapset or an explicit
guru-mode #include .

* man stap says "headers are built with default GCC parameters"; this means
there's no guarantee the values stap sees match what the program was
compiled with;

* It requires full headers of the probed lib/program to be available at
probe compile time, *including* transitive dependency headers included by
the program/lib headers, which is not always desirable or possible;

* The headers must be safely include-able including any/all headers they
include in turn, any required preprocessor definitions, etc. Some headers
have specific order-of-inclusion rules, may define or redefine symbols that
should not be exposed, etc;

* "stap" doesn't appear provide a simple way to specify the include path to
search for such headers

What about macros?
====

Macros are harder.

They're only present if -g3 or -ggdb3 was used at build-time, which is
still not the norm even though the results are much more compact now than
they used to be. Most packages don't use dwz to compact their debuginfo
either.

Even if present in the .debug_macro ELF section, the macro definitions may
be arbitrary C expressions. They are not guaranteed to be literal integers
or strings. gdb knows how to evaluate expressions when attached to a
process, but I don't think it can do so statically. So using macro
definitions from debuginfo will only work in cases where the macro
definition is simple. It'd still be really handy.

There's

    eu-readelf --debug-dump=macro some.so

and there's libdw's dwarf_getmacros() etc so the foundations are there.

I'd definitely want to start with @enum first though.

Then only add a @dwarf_macro later, if feasible, and probably restricted to
simple numeric or string literals.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Code maintenance / verbosity: macros, enums, and casts
  2020-07-21  2:54 Code maintenance / verbosity: macros, enums, and casts Craig Ringer
@ 2020-07-21  4:11 ` Arkady
  2020-07-23  5:16 ` Craig Ringer
  1 sibling, 0 replies; 3+ messages in thread
From: Arkady @ 2020-07-21  4:11 UTC (permalink / raw)
  To: Craig Ringer; +Cc: systemtap

Craig,

Indeed the STAP script language lacks some (arguably a lot) of useful syntax.
I assume that development of a rich programming language in the kernel
space was never a goal.
I can guess some of the reasons:
* memory management in such language can get tricky.
* we can lose a simple translation from the STAP to C

You probably do not need a modern programming language in the kernel space.
This is what I did. Instead of running business logic in the STAP
script I  add events to the FIFO in a shared memory.
A probe in the script is typically a single liner copying
arguments and return codes to the FIFO.
Most of the business logic runs in the user space.

The main pros of this approach:
* No need to patch STAP (there is one small patch of the begin and end probe)
* Code running in the kernel remains small. It reduces latency in
the system calls I hook.
* I can use any modern programming language. I use Golang, for
example, but I started with C++ and Python.
* Overhead of outputting the STAP trace can be reduced. I use binary
data in the FIFO.
* I use custom hash tables instead of STAP maps
(https://github.com/larytet/lockfree_hashtable). The hashtable
limits applications, but can be faster in some situations.

Arkady.

On Tue, Jul 21, 2020 at 5:55 AM Craig Ringer <craig@2ndquadrant.com> wrote:
>
> Hi all
>
> TL;DR: I'd like to add DWARF-based @enum("foo","my.so") and am interested
> in whether that'd be potentially mergeable + advice on where in the stap
> code I might want to start.
>
> ------
>
> I'm looking for advice from people more experienced with writing and
> maintaining real world systemtap tapsets and tools to manage tapset
> verbosity and make my tapsets reasonably maintainable.
>
> As I write most stap code I'm increasingly finding that I produce a lot of
> boilerplate, especially for member access via a @cast or @var, for mapping
> enums to their long values; for mapping scalar integer macros to their long
> values; for mapping the values of various enums back to their symbolic
> names; and for comparing with C "char" literals.
>
> Constants are definitely the biggest one right now. In addition to their
> verbosity, if they're maintained in tapsets instead of looked up from the
> probed code there's a risk that the tapset's idea of a given enum or macro
> definition may not match what the code is using.
>
> I write a lot of tapset boilerplate like:
>
> @define FOO_BAR %( 0 %)
> @define FOO_BAZ %( 0x63 %) // 'c'
> @define FOO_BAK %( 0x64 %) // 'd'
>
> function foo_str:string(foo:long) {
>     if (foo == @FOO_BAR) return "BAR"
>     if (foo == @FOO_BAZ) return "BAZ"
>     if (foo == @FOO_BAK) return "BAK"
> }
>
> Add @enum("FOO_BAR", "my.so")
> ====
>
> If these are defined in the sources as an enum, it'd be great to look them
> up at stap compile time with a new @enum construct.
>
> I'm wondering how feasible this might be and where I should start looking
> if I want to try implementing it.
>
> Enums are guaranteed to be evaluated to a constant value and they're in the
> ELF .debug_info section. So there's no need to for the executable to have
> -g3 / -ggdb3 debuginfo, the default is fine. And there are no problems with
> evaluating expressions, since there's a constant pre-evaluated by the
> compiler and embedded in the debuginfo.
>
> I can get enum info with
>
>    eu-readelf --debug-dump=info mylib.so
>
> though I'm still trying to understand the elfutils / libdw api well enough
> to actually access it...
>
>
> Why not @const?
> ====
>
> I'm aware of "@const" but there are a number of reasons it doesn't help me:
>
> * It needs guru mode, which is really not desirable on production systems
> (or you need to preinstall a tapset that exposes the constants which then
> defeats the point);
>
> * It doesn't provide a way to specify the header(s) to include to find the
> constant, you have to handle that externally via a tapset or an explicit
> guru-mode #include .
>
> * man stap says "headers are built with default GCC parameters"; this means
> there's no guarantee the values stap sees match what the program was
> compiled with;
>
> * It requires full headers of the probed lib/program to be available at
> probe compile time, *including* transitive dependency headers included by
> the program/lib headers, which is not always desirable or possible;
>
> * The headers must be safely include-able including any/all headers they
> include in turn, any required preprocessor definitions, etc. Some headers
> have specific order-of-inclusion rules, may define or redefine symbols that
> should not be exposed, etc;
>
> * "stap" doesn't appear provide a simple way to specify the include path to
> search for such headers
>
> What about macros?
> ====
>
> Macros are harder.
>
> They're only present if -g3 or -ggdb3 was used at build-time, which is
> still not the norm even though the results are much more compact now than
> they used to be. Most packages don't use dwz to compact their debuginfo
> either.
>
> Even if present in the .debug_macro ELF section, the macro definitions may
> be arbitrary C expressions. They are not guaranteed to be literal integers
> or strings. gdb knows how to evaluate expressions when attached to a
> process, but I don't think it can do so statically. So using macro
> definitions from debuginfo will only work in cases where the macro
> definition is simple. It'd still be really handy.
>
> There's
>
>     eu-readelf --debug-dump=macro some.so
>
> and there's libdw's dwarf_getmacros() etc so the foundations are there.
>
> I'd definitely want to start with @enum first though.
>
> Then only add a @dwarf_macro later, if feasible, and probably restricted to
> simple numeric or string literals.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Code maintenance / verbosity: macros, enums, and casts
  2020-07-21  2:54 Code maintenance / verbosity: macros, enums, and casts Craig Ringer
  2020-07-21  4:11 ` Arkady
@ 2020-07-23  5:16 ` Craig Ringer
  1 sibling, 0 replies; 3+ messages in thread
From: Craig Ringer @ 2020-07-23  5:16 UTC (permalink / raw)
  To: systemtap

Replying-to-self with some notes/progress on this.

While I'm at it, having started to read more of the systemtap code
properly, it's an absolutely miraculous tool in terms of how easily the
complexity it deals with is hidden from the user. Truly incredible,
thankyou!

TL;DR: WIP patch at https://github.com/ringerc/systemtap-patches . Some
compile issues remain, so it's not even a PoC yet, but I think I've figured
out how it all has to work.

It didn't help that libdw itself, is not exactly ... documented. For anyone
else looking for documention on libdw, elfutils etc, this is one of the few
sources I found:
https://developer.ibm.com/technologies/systems/articles/au-dwarf-debug-format/

(Now that I think I worked much of it out, at some stage I hope I'll be
able to come and add a few comments in relevant places to help explain what
the various visitors do, how the dwarf_query stuff works and which ones
relate to which stap code constructs, etc.)
<https://developer.ibm.com/technologies/systems/articles/au-dwarf-debug-format/>
-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-23  5:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21  2:54 Code maintenance / verbosity: macros, enums, and casts Craig Ringer
2020-07-21  4:11 ` Arkady
2020-07-23  5:16 ` Craig Ringer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).