From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by sourceware.org (Postfix) with ESMTPS id 8FD4D3857C4C for ; Tue, 21 Jul 2020 04:11:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8FD4D3857C4C Received: by mail-ot1-x344.google.com with SMTP id n24so14039160otr.13 for ; Mon, 20 Jul 2020 21:11:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=x9QhVeGtWFTUT6TlJ8Cmj0KMCunySo3MyPTAeXmhPL0=; b=kW1GH4ZaesXPk6mHG+iNVmSjMGydHHkOKzGI7eM5Y3Y0eNIiVj5iPiwYLhW+fecj7J joWJyH/M+PCYVPK0GWOxBBQ8/hRxZqLqTLU6Uq4LJnKJFvU83mxMNFjMATRMvIm+R7+j 0ied5xxt9Dr9GGlos/BzceeGKybnhRnA0Lt21+e6Xy55bwrqhRHX8FHO7bqz/9rAEb8w PSFP19Q6JUOjxcL+7LNAGrg9Eg22uvwgpDjn/WR0qpS0CBDhFd3pZdKzSUdK30vDHpEF Gp/uNFb+iG+tyczr9LZpeYZBS0wiKrv82IHM6+R9yYZgtDebs7qzxiY9v9jo4AT50Qx1 8rqA== X-Gm-Message-State: AOAM532cxS7lJzJOVrSyhAD/dU37/w+uVU2SjHKLrztlL1dXfjxyK71h RgGIRcrSE9cI2ow4VG/1quSsz9AEGDL7bv9lKyM= X-Google-Smtp-Source: ABdhPJx5CNC4vWsdCm8JxkusxZNWyCRBy9dKPlAlnaIfc2cNfTV+hEaCHqQBJx5aAEK3+Mgf8C/1W2LmyrgDYkjs03o= X-Received: by 2002:a05:6830:15c3:: with SMTP id j3mr21724616otr.2.1595304712888; Mon, 20 Jul 2020 21:11:52 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Arkady Date: Tue, 21 Jul 2020 07:11:41 +0300 Message-ID: Subject: Re: Code maintenance / verbosity: macros, enums, and casts To: Craig Ringer Cc: systemtap Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: systemtap@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Systemtap mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2020 04:11:55 -0000 Craig, Indeed the STAP script language lacks some (arguably a lot) of useful syntax. I assume that development of a rich programming language in the kernel space was never a goal. I can guess some of the reasons: * memory management in such language can get tricky. * we can lose a simple translation from the STAP to C You probably do not need a modern programming language in the kernel space. This is what I did. Instead of running business logic in the STAP script I add events to the FIFO in a shared memory. A probe in the script is typically a single liner copying arguments and return codes to the FIFO. Most of the business logic runs in the user space. The main pros of this approach: * No need to patch STAP (there is one small patch of the begin and end probe) * Code running in the kernel remains small. It reduces latency in the system calls I hook. * I can use any modern programming language. I use Golang, for example, but I started with C++ and Python. * Overhead of outputting the STAP trace can be reduced. I use binary data in the FIFO. * I use custom hash tables instead of STAP maps (https://github.com/larytet/lockfree_hashtable). The hashtable limits applications, but can be faster in some situations. Arkady. On Tue, Jul 21, 2020 at 5:55 AM Craig Ringer wrote: > > Hi all > > TL;DR: I'd like to add DWARF-based @enum("foo","my.so") and am interested > in whether that'd be potentially mergeable + advice on where in the stap > code I might want to start. > > ------ > > I'm looking for advice from people more experienced with writing and > maintaining real world systemtap tapsets and tools to manage tapset > verbosity and make my tapsets reasonably maintainable. > > As I write most stap code I'm increasingly finding that I produce a lot of > boilerplate, especially for member access via a @cast or @var, for mapping > enums to their long values; for mapping scalar integer macros to their long > values; for mapping the values of various enums back to their symbolic > names; and for comparing with C "char" literals. > > Constants are definitely the biggest one right now. In addition to their > verbosity, if they're maintained in tapsets instead of looked up from the > probed code there's a risk that the tapset's idea of a given enum or macro > definition may not match what the code is using. > > I write a lot of tapset boilerplate like: > > @define FOO_BAR %( 0 %) > @define FOO_BAZ %( 0x63 %) // 'c' > @define FOO_BAK %( 0x64 %) // 'd' > > function foo_str:string(foo:long) { > if (foo == @FOO_BAR) return "BAR" > if (foo == @FOO_BAZ) return "BAZ" > if (foo == @FOO_BAK) return "BAK" > } > > Add @enum("FOO_BAR", "my.so") > ==== > > If these are defined in the sources as an enum, it'd be great to look them > up at stap compile time with a new @enum construct. > > I'm wondering how feasible this might be and where I should start looking > if I want to try implementing it. > > Enums are guaranteed to be evaluated to a constant value and they're in the > ELF .debug_info section. So there's no need to for the executable to have > -g3 / -ggdb3 debuginfo, the default is fine. And there are no problems with > evaluating expressions, since there's a constant pre-evaluated by the > compiler and embedded in the debuginfo. > > I can get enum info with > > eu-readelf --debug-dump=info mylib.so > > though I'm still trying to understand the elfutils / libdw api well enough > to actually access it... > > > Why not @const? > ==== > > I'm aware of "@const" but there are a number of reasons it doesn't help me: > > * It needs guru mode, which is really not desirable on production systems > (or you need to preinstall a tapset that exposes the constants which then > defeats the point); > > * It doesn't provide a way to specify the header(s) to include to find the > constant, you have to handle that externally via a tapset or an explicit > guru-mode #include . > > * man stap says "headers are built with default GCC parameters"; this means > there's no guarantee the values stap sees match what the program was > compiled with; > > * It requires full headers of the probed lib/program to be available at > probe compile time, *including* transitive dependency headers included by > the program/lib headers, which is not always desirable or possible; > > * The headers must be safely include-able including any/all headers they > include in turn, any required preprocessor definitions, etc. Some headers > have specific order-of-inclusion rules, may define or redefine symbols that > should not be exposed, etc; > > * "stap" doesn't appear provide a simple way to specify the include path to > search for such headers > > What about macros? > ==== > > Macros are harder. > > They're only present if -g3 or -ggdb3 was used at build-time, which is > still not the norm even though the results are much more compact now than > they used to be. Most packages don't use dwz to compact their debuginfo > either. > > Even if present in the .debug_macro ELF section, the macro definitions may > be arbitrary C expressions. They are not guaranteed to be literal integers > or strings. gdb knows how to evaluate expressions when attached to a > process, but I don't think it can do so statically. So using macro > definitions from debuginfo will only work in cases where the macro > definition is simple. It'd still be really handy. > > There's > > eu-readelf --debug-dump=macro some.so > > and there's libdw's dwarf_getmacros() etc so the foundations are there. > > I'd definitely want to start with @enum first though. > > Then only add a @dwarf_macro later, if feasible, and probably restricted to > simple numeric or string literals. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > 2ndQuadrant - PostgreSQL Solutions for the Enterprise