public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* (PR11207) Macroprocessor discussion
       [not found] <f1a29b1a-adce-401a-8222-f213c8233958@zmail19.collab.prod.int.phx2.redhat.com>
@ 2012-06-12 15:39 ` Serguei Makarov
  2012-06-14  1:15   ` Frank Ch. Eigler
  0 siblings, 1 reply; 9+ messages in thread
From: Serguei Makarov @ 2012-06-12 15:39 UTC (permalink / raw)
  To: systemtap

This is a thread to keep track of further back and forth on PR11207 (http://sourceware.org/bugzilla/show_bug.cgi?id=11207), an attempt to develop (or steal) some kind of solution for preprocessor macros in systemtap code. (Trading large chunks of text back and forth on a mailing list is more convenient than trading text back and forth on a Bugzilla page.)

To begin with, here is the current ultra-custom version of the...

# SystemTap Macro Processor Syntax Proposal

The macroprocessor has both an embedded mode (which feeds text directly to the systemtap parser, retaining the original line and column coordinates for error reporting purposes) and a standalone mode (which produces text output that can be used e.g. for further processing of docstrings into documentation).

# Example Usage - Defining Multiple Probes

    /** ... doc comments common to all ip probes ... */
    %define(ipprobes, probe_name, hook4_name, hook6_name,(
      %grab_invocation_docs(probe_name, ipprotocol_name)
      %grab_definition_docs
    
      %defloc(ipprobe_body_common,(
        ... stuff common to both ipv4 and ipv6 ...
      ))
    
      /** %invocation_docs(ip,IP)
       * %definition_docs
       * ... doc comments specific to ip ... */
      probe netfilter.ip.%probe_name = netfilter.ipv4.%probe_name,
              netfilter.ipv6.%probe_name { }
    
      /** %invocation_docs(ip4,IPv4)
       * %definition_docs
       * ... doc comments specific to ipv4 ... */
      probe netfilter.ipv4.%probe_name
              = netfilter.pf("NFPROTO_IPV4").hook(%hook4_name) {
        %ipprobe_body_common
        ... stuff specific to ipv4 ...
      }

      /** %invocation_docs(ip6,IPv6)
       * %definition_docs
       * ... doc comments specific to ipv6 ... */
      probe netfilter.ipv6.%probe_name
              = netfilter.pf("NFPROTO_IPV6").hook(%hook4_name) {
        %ipprobe_body_common
        ... stuff specific to ipv6 ...
      }
    ))

    /** probe netfilter.%probe_name.pre_routing - Called before an %ipprotocol_name packet is routed */
    %ipprobes(pre_routing,"NF_INET_PRE_ROUTING","NF_IP6_PRE_ROUTING")

# Example Usage - Defining a Shorthand for a Cast Operation

    %define(FOO,ptr, @cast(ptr, "struct foo", "/path/to/app:<sys/foo.h") )
    bar = %FOO(p)->bar
    baz = %FOO(p)->baz

# Example Usage - Equivalent to a Standard CPP #define Macro

    %define(AREA,base,height, ((base)*(height)/2.0) )
    a = AREA(2+2,5) // correctly handles precedence

# Detailed Explanation

Preprocessor Activation Sequences
    %ident  -- parameterless macro invocation
    %ident( -- macro invocation with parameter list
    %(      -- conditional expression of the form %( ... %? ... %: ... %)
    /**     -- docstring enclosure -- we can use macro invocations inside

Items inside a macro invocation's parameter list are separated by commas. To allow complex parameters which stretch over multiple lines and in turn contain commas, the macro processor respects nested parens "()" and does not end a parameter if it encounters commas inside the parens. If an opening paren "(" is the very first character of the parameter, the outermost parens are discarded:
    %ident(p1,p2) -- calls macro ident with two parameters "p1" and "p2"
    %ident((p1,p2)) -- calls macro ident with one parameter "p1,p2" (discarding outermost parens)
    %ident(p1,(p2,p3)) -- calls macro ident with two parameters "p1" and "p2
    %ident(((param))) -- calls macro ident with one parameter "(p1,p2)" (discarding outermost parens)
    %ident(foo(p1,p2)) -- calls macro ident with one parameter "foo(p1,p2)"
    %ident( (p1,p2)) -- calls macro ident with one parameter " (p1,p2)" (keeping outermost parens)

This business with the parens allows us to write complex multiline definitions:
    %define(multiline_macro,(
      ... multiple lines of stuff go here ...
    ))

Behaviour currently undecided for the following examples:
    %ident((p1,p2)   ) -- PROBABLY DISCARD PARENS
    %ident((p1,p2)foobar) -- BEHAVIOUR UNDECIDED

The macro processor is aware of docstrings. When invoking a macro, it keeps track of the most recent docstring it saw before the invocation. If the docstring is accessed via %grab_invocation_docs or %grab_definition_docs, it is pulled out of the original source text (as though it was never there) and made available to be re-inserted elsewhere using %invocation_docs and %definition_docs. There are a few tricky issues to handle with the fact that text inside docstrings traditionally prefixes each new line with " * ", but it should be feasible to magically do the right thing to allow use of regular macros inside docstrings, e.g:

    %define(code_snippet,(
    printf("This is sample code!\n")
    printf("It extends over multiple lines!\n")
    ))

    /** This is a docstring, handled specially by the macroprocessor.
     *
     * %code_snippet
     *
     * The snippet above magically has each line start with ' * '. */
    probe foo {
          // But we can also employ the multiline macro in regular code!
          %code_snippet
    }

    /* The IMPORTANT thing though (the actual reason for making this work)
       is that the DEFINITION of code_snippet DOES NOT have to have each
       line except the first start with " * ". */

Obviously, some nasty magic is required behind the scenes to make this work transparently. My opinion is that it can be made to work with some thought. (XXX: As a bonus, similar magic could also be performed to ensure that the indentation of macro output is at least as much as the indentation of surrounding code.)

XXX: some macro %defraw or similar can be used to bypass all the magic?

Docstring considerations are only relevant for the standalone (text-generating) version of the macroprocessor. The embedded version can obviously discard all comments and special comment-handling macros.

Predefined Macros (possibility)
    %define(name, param_1, param_2, ..., param_n, macro_body)
    %defloc(name, param_1, param_2, ..., param_n, macro_body)
    %grab_invocation_docs(param_1, param_2, ..., param_n)
    -- defines a local %invocation_docs(...) macro
    %grab_definition_docs(param_1, param_2, ..., param_n)
    -- defines a local %definition_docs(...) macro
    %undef(name)

XXX: alternate naming scheme for the macros?

%define and %undef work in the obvious manner. param_1, param_2, ... give names of parameters, which are available as zero parameter macros inside the definition body:

    %define(foo,x, ((%x) + 2) )
    // Note the style of putting spaces around the body of a one-line macro.
    // This ensures the parentheses are kept, while still looking reasonable.

%defloc is a bit tricky. When it is encountered inside a macro body, the resulting definition is only valid for the duration of the macro body expansion; afterwards, the old definition is restored. It is effectively a more compact version of the m4 idiom where we 'pushdef' a macro at the beginning of a macro body and 'popdef' it at the end. (XXX: We may want to define pushdef and popdef as well at a later point.)

%grab_invocation_docs inside a macro body grabs the docstring closest to the point where the macro is invoked and makes it available to instead be inserted whenever we use the macro %invocation_docs. (If %invocation_docs is used *inside* a docstring, it seamlessly pastes the content of one docstring into the other.)

%grab_definition_docs inside a macro body works similarly with the docstring attached to the invocation of %define that created the macro.

The reason both of the latter macros have parameters is because we might want to repeat the same docstring several times with custom substitutions (see above). To make this work we could (XXX: either delay evaluation of docstring contents until we know the context it is evaluated in -- probably this is the better solution, OR XXX: or simply say that undefined macro invocations are left untouched). Hence docstrings can be constructed as follows:

    %define(cakery,(
      %grab_invocation_docs(foodstuff)

      /** EXTRA BLAH BLAH %invocation_docs(CHEESE) */
      probe cheese { ... }

      %invocation_docs(BEER) // this also works
      probe beer { ... }
    ))

    /** BLAH BLAH BLAH DOCUMENTATION ABOUT %foodstuff */
    %cakery

This would produce output similar to the following:

    /** EXTRA BLAH BLAH BLAH BLAH BLAH DOCUMENTATION ABOUT CHEESE */
    probe cheese { ... }

    /** BLAH BLAH BLAH DOCUMENTATION ABOUT BEER */
    probe beer { ... }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (PR11207) Macroprocessor discussion
  2012-06-12 15:39 ` (PR11207) Macroprocessor discussion Serguei Makarov
@ 2012-06-14  1:15   ` Frank Ch. Eigler
  2012-06-14 20:50     ` Serguei Makarov
  0 siblings, 1 reply; 9+ messages in thread
From: Frank Ch. Eigler @ 2012-06-14  1:15 UTC (permalink / raw)
  To: Serguei Makarov; +Cc: systemtap



Hi, Serhei -

Lots of work done there, thank you!  I suspect we would want to start
a little smaller, and see how far that can go.  I also have one or two
syntax tweaks for you to consider.


> [...]
> # Example Usage - Defining a Shorthand for a Cast Operation
>
>     %define(FOO,ptr, @cast(ptr, "struct foo", "/path/to/app:<sys/foo.h") )
>     bar = %FOO(p)->bar
>     baz = %FOO(p)->baz
>
> # Example Usage - Equivalent to a Standard CPP #define Macro
>
>     %define(AREA,base,height, ((base)*(height)/2.0) )
>     a = AREA(2+2,5) // correctly handles precedence

These look basically good.

> # Detailed Explanation
> [...]
> This business with the parens allows us to write complex multiline
> definitions:
>     %define(multiline_macro,(
>       ... multiple lines of stuff go here ...
>     ))

Instead of using so much parenthesis/whitespace magic, how about:
To define:

%define macro2(a,b)
something(a) + something_else(b)
%end

To invoke:

%macro2(foo,bar)

(The code will have to take care w.r.t. undesirable recursion/nesting
with macroexpansion.)


> The macro processor is aware of docstrings. When invoking a macro,
> it keeps track of the most recent docstring it saw before the
> invocation. If the docstring is accessed via %grab_invocation_docs
> or %grab_definition_docs [...]

I wonder if, instead of this, we can extend the doc extractor widget
to accept formats that a more naive macro engine could produce.  Drop
the requirement for exact /* * foo */ patterns, accept some other,
appendable format.  Perhaps:

%define general_docs (name)
/***   <--- signal to have kerneldoc widget glue this comment to previous one
  * some more text to be added
  * @param foo bar
  */
%end

%define general_body (name1,name2)  // without cpp style # / ## token-pasting
   name1 = @var(name2)
%end

/** sfunction foobar */
%general_docs (foobar)
/*** more docs */
function foobar () {
  %general_body (foobar,"foobar")
}


> Predefined Macros (possibility)
>     %define(name, param_1, param_2, ..., param_n, macro_body)
>     %defloc(name, param_1, param_2, ..., param_n, macro_body)
>     %grab_invocation_docs(param_1, param_2, ..., param_n)
>     -- defines a local %invocation_docs(...) macro
>     %grab_definition_docs(param_1, param_2, ..., param_n)
>     -- defines a local %definition_docs(...) macro
>     %undef(name)
> [...]

I'd start just with the first and last and see how far one can get
with only them.  If that %defloc macro is only for meta-macros, I'd
like to see whether we can get by without them.


- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (PR11207) Macroprocessor discussion
  2012-06-14  1:15   ` Frank Ch. Eigler
@ 2012-06-14 20:50     ` Serguei Makarov
  2012-06-14 21:21       ` Josh Stone
  0 siblings, 1 reply; 9+ messages in thread
From: Serguei Makarov @ 2012-06-14 20:50 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Okay, another stab at reasoning this through. jistone raised the very good point on IRC that we may be considering a macroprocessor which works on already-tokenized data. This would be very different in some respects from the text-based proposal I'd been considering.

Anyhow, the following goals/issues would be necessary to consider for either approach:
- Source Coordinates - correctly preserved for the sake of error reporting.
- Documentation Generation - macros can by used to generate custom docstrings.
- Correct Handling of Brackets - if the preprocessor syntax uses brackets {} or parens (), these interact correctly with any brackets or parens inside the macro parameters
  - by default, the preprocessor respects bracket nesting in the obvious way (brackets are expected to match)
  - the preprocessor knows about the possibility of brackets inside string literals and doesn't attempt to match them
  - there is some kind of e.g. quoting facility for emitting non-matching parens from a macro
- Explicit Macro Invocation - so far we seem to be leaning away from the implicit macro invocation style m4 (and cpp) use, where any identifier is a possible macro invocation. Instead almost all preprocessor stuff, including macro invocations, would be prefixed with a special character such as '%'.

These are just some haphazard notes; I'll come back to them and organize more coherently very soon :)

# Token-Based Approach

Design Challenges
- Source Coordinates - almost trivial to solve due to the tokens being tagged appropriately.
- Documentation Generation - EITHER rig the lexer to retain comments and emit lexed output back as text, OR subsume kernel-doc into the systemtap lexer.
- Correct Handling of Brackets - mostly done for us by the lexer. We still have to handle bracket balancing, EITHER counting bracket depth (and introducing a special mechanism to emit unmatched brackets) OR using some distinct bracketing syntax such as %begin ... %end, 
- Explicit Macro Invocation - consists mostly of the lexer recognizing an addition macro invocation token of the form %ident.

Proposed Syntax
- %define foo(param1, param2, ...) ... %end
- %undef foo
- %foo, %foo(param1, param2, ...)
- /** docstring */
- /*** docstring to attach to previous one */
- %\( , %\) or something for emitting unmatched brackets if necessary

# Text-Based Approach

This would be a macroprocessor with a standalone mode for documentation generation, and an embedded mode to be used as a preprocessing stage before the lexer.

Design Challenges
- Source Coordinates - the macro processor needs to be hooked up directly to the lexer, feeding it a suitable sequence of characters and source coordinate directives.
- Documentation Generation - the macro processor emits text that is consumed by kernel-doc. EITHER the built-in macros need to be defined to magically handle docstrings (as described in a previous email) OR we again use the /*** continuation-docstring notation fche suggested.
- Correct Handling of Brackets - in addition to balancing brackets within a macro invocation, the macroprocessor needs to recognize string constants in order to ignore the brackets within them.
- Explicit Macro Invocation - not too hard or too different from implicit invocation, really.

Proposed Syntax
- %define(foo,param1,param2,...)
- %macro foo(param1, param2, ...) { ... }
- %foo, %foo(...)
- /** docstring */
- /*** this continuation-docstring as well if necessary */
- %\( , %\) or such if necessary

# Misc

Still thinking over where these fit in:
- %( ... %? ... %: ... %) conditionals (these need access to systemtap-internal logic to be really convenient -- perhaps the standalone macroprocessor mode ignores them, while the embedded mode has callbacks into systemtap code?)
- command line arguments $1, $2, ... (on IRC it was brought to my attention that these are effectively macro-substituted in the current systemtap) -- again, handle these by giving the macroprocessor some callbacks when in embedded mode?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (PR11207) Macroprocessor discussion
  2012-06-14 20:50     ` Serguei Makarov
@ 2012-06-14 21:21       ` Josh Stone
  2012-06-15 14:37         ` Serguei Makarov
  0 siblings, 1 reply; 9+ messages in thread
From: Josh Stone @ 2012-06-14 21:21 UTC (permalink / raw)
  To: Serguei Makarov; +Cc: Frank Ch. Eigler, systemtap

On 06/14/2012 01:49 PM, Serguei Makarov wrote:
> # Token-Based Approach
[...]
> - Correct Handling of Brackets - mostly done for us by the lexer. We
> still have to handle bracket balancing, EITHER counting bracket depth
> (and introducing a special mechanism to emit unmatched brackets) OR
> using some distinct bracketing syntax such as %begin ... %end,

On IRC, I was trying to suggest %( %) as this bracketing syntax, since
it already exists for conditionals.  Then you don't have to worry about
balancing at all for ( ) { } tokens inside.  You do still have to
balance preprocessor conditionals inside, but I think that's more
logically associated.


> # Text-Based Approach
[...]
> Proposed Syntax
> - %define(foo,param1,param2,...)
> - %macro foo(param1, param2, ...) { ... }

I don't like having such different syntax for nearly the same thing.  Is
the main difference here meant to be multiline?  Also, consider the
possibility of 0-parameter, e.g. %foo and %foo() -- the difference is
minimal, but the author can use this to hint what the macro is doing.

So, how about single-line is without brackets, and multi-line uses
brackets.  Like:

%define foo1(...) contents until end of line
%define foo2(...) %( contents can
  go as long as you
  like %)
%define bar1 another one-liner, no parameters
%define bar2 %(
  multiple lines with
  no parameters
%)

I think this can work for both the token- and text-based approaches.


Josh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (PR11207) Macroprocessor discussion
  2012-06-14 21:21       ` Josh Stone
@ 2012-06-15 14:37         ` Serguei Makarov
  2012-06-15 16:06           ` Frank Ch. Eigler
  0 siblings, 1 reply; 9+ messages in thread
From: Serguei Makarov @ 2012-06-15 14:37 UTC (permalink / raw)
  To: Josh Stone; +Cc: Frank Ch. Eigler, systemtap

So, summarizing back the proposal when I've incorporated Josh's latest suggestions:
- token-based
- separate executable is produced for generating documentation (incorporating the stap lexer + macro processor + not much else)
- lexer gathers such things as temporary '\n' tokens for macroprocessing purposes
- flow of data is INPUT TEXT =(lexer)=> TOKEN STREAM WITH '\n' =(macroprocessor)=> (TOKEN STREAM FOR PARSER / TEXT OUTPUT FOR DOCUMENTATION)
- it may be possible/necessary to subsume kernel-doc into the macroprocessor functionality to avoid having to reconstruct output text from a token stream

Macroprocessor syntax
- single-line definition %define foo(...) ... \n
- multiple-line definition %define foo(...) %( ... %) -- so plain '()' parens in a definition need not balance
- macro invocation %foo(...) -- so plain '()' parens in an invocation have to balance
- of course, all of the above can also be zero-parameter (without the parens)
- delete a definition with %undef foo
- conditional expression %( ... %? ... %: ... %)
- /** ... */ doc comment
- /*** ... */ doc comment continuation
- $1, $2, ... command line arguments

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (PR11207) Macroprocessor discussion
  2012-06-15 14:37         ` Serguei Makarov
@ 2012-06-15 16:06           ` Frank Ch. Eigler
  2012-07-03 16:17             ` Serguei Makarov
  0 siblings, 1 reply; 9+ messages in thread
From: Frank Ch. Eigler @ 2012-06-15 16:06 UTC (permalink / raw)
  To: Serguei Makarov; +Cc: Josh Stone, systemtap


smakarov wrote:

> [...]
> - flow of data is INPUT TEXT =(lexer)=> TOKEN STREAM WITH '\n' =(macroprocessor)=> (TOKEN STREAM FOR PARSER / TEXT OUTPUT FOR DOCUMENTATION)
> [...]

If the macroprocessor sits after the lexer, it will not see comments,
and cannot produce them for the parse.


> [...]
> Macroprocessor syntax
> - single-line definition %define foo(...) ... \n
> - multiple-line definition %define foo(...) %( ... %) -- so plain '()' parens in a definition need not balance
> [...]

Looks fine.

> - macro invocation %foo(...) -- so plain '()' parens in an invocation have to balance
> - conditional expression %( ... %? ... %: ... %)
> - /** ... */ doc comment
> - /*** ... */ doc comment continuation
> - $1, $2, ... command line arguments

It is essential to spell out the composition rules of these various
preprocessing / expansion mechanisms.  In other words, can one nest
%define within %( or vice versa?  Define or expand a macro with $1
argument?  Within a comment?


- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (PR11207) Macroprocessor discussion
  2012-06-15 16:06           ` Frank Ch. Eigler
@ 2012-07-03 16:17             ` Serguei Makarov
  2012-07-05  4:50               ` Frank Ch. Eigler
  0 siblings, 1 reply; 9+ messages in thread
From: Serguei Makarov @ 2012-07-03 16:17 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Josh Stone, systemtap

Current status of token-based macroprocessor. Bit rough - still poring over bits of it, but it's at the stage where I'm ready to start experimenting with implementing the basics. (Modulo any commentary that anyone has for me.)

# Proposal for embedded (token-based) macroprocessor

This is a macroprocessor much more tightly coupled to the systemtap language than with a text-based approach. It fits in place of `scan_pp()` in the parser code, hence transforming a stream of tokens into a stream of tokens with preprocessor constructs expanded.

Documentation generation is handled by adding a separate mode to `scan_pp()`, which expands macros *including* docstrings. This mode is wrapped in a separate (platform-independent) frontend, which simply extracts the resulting docstrings and uses them as input to kernel-doc (or some other suitable documentation generator).

Macro definitions are always local to a single file; it is impossible to e.g. define a macro in a tapset and then use it in a systemtap script that includes the tapset.

# Basic Constructs

## Macro Definition (one line)

    %define macro_name(param_name1, param_name2, ...) macro_body

The body of the macro definition is expanded at the point of invocation, not when the macro is defined.

## Macro Definition (multiple lines)

    %define macro_name(param_name1, param_name2, ...) %(
      macro_body
    %)

The body of the macro definition is expanded at the point of invocation, not when the macro is defined.

Macro definitions cannot be nested (at least in the initial version of the preprocessor); this rule considerably simplifies semantics and implementation. Brackets belonging to conditionals must be balanced properly.

    // The following is an error -- brackets belong to the %define:
    %define foo %( condition %? a %: b %)

    // Use the following instead to avoid ambiguity:
    %define foo %(
        %( condition %? a %: b %)
    %)

The tricky possibility of someone trying to write a one-line conditional directly inside a macro definition is handled by assuming that a `%(` following a `%define` is automatically swallowed by the `%define`, as shown above.

## Macro Invocation

    macro_name
    macro_name(param1,param2,...)

Brackets inside a macro invocation parameter must be balanced.

Macro parameters are expanded *before* being passed to the macro itself, except in a few special cases (namely the `grab_*` macros).

Parameter names occurring inside a macro body are expanded to the original values, in the same fashion as parameterless macros.

## Macro Invocation -- Alternate Variant

    %macro_name
    %macro_name(param1,param2,...)

Brackets inside a macro invocation parameter must be balanced.

Macro parameters are expanded *before* being passed to the macro itself, except in a few special cases (namely the `grab_*` macros).

Parameter names occurring inside a macro body are expanded to the original values, in the same fashion as parameterless macros.

The sigil used for macro invocation here conflicts with the modulo operator `%`, but this is not a huge problem because expressions like `200%(3+4)` already trigger the current preprocessor.

## Docstring (given special treatment)

Docstrings are ignored outside of a special docstring-generation mode. Otherwise, a docstring is retained in the token stream as a special kind of token.

    /** This standard docstring is recognized by the macroprocessor.
        Macros inside the docstring are expanded when marked with a
        sigil '%'. Docstrings can also be grabbed and manipulated
        using the special directives described below. */

    /*** This is also recognized as a docstring; but additionally, as
         the lexer encounters it, it is pasted together with the
         previous docstring that occurs (in the unexpanded text). */

Because the content of a docstring is arbitrary text, the token-based preprocessor is not suited to generating docstring contents in the exact same manner as in ordinary code. Instead, the sigil trick is required for a macroprocessing construct to be recognized:

    /*** To evaluate a macro 'foo', mark it with a sigil like so: %foo */

This causes the preprocessor to start consuming tokens until it has parsed an entire macro invocation. Then the invocation is expanded, and the resulting tokens are synthesized back into text.

Because the result of evaluating conditionals depends on the target system being compiled for, while documentation is supposed to be largely platform-independent, conditionals cannot occur inside docstrings.

## Manipulation of Docstrings

Evaluation of docstrings that immediately precede a macro invocation or `%define` construct is delayed. If the invocation turns out to grab the docstring, it is deleted from the token stream at that point (and reinserted wherever the invocation requires it to, as detailed below). `%define` constructs always grab the immediately preceding docstrings.

The basic mechanism for retrieving a docstring is outlined below:

     grab_invocation_docs(param_name1,param_name2,...)

Found in a macro body, this grabs the docstring closest to the point where the macro was invoked, and inserts it whenever a macro of the form `invocation_docs(param1,param2,...)` is encountered. The parameters are substituted into the body of the docstring.

     grab_definition_docs(param_name1,param_name2,...)

Likewise, this grabs the docstring closest to the point where the macro was *defined*; the docstring is retrieved using the macro `definition_docs`.

The arguments to a `grab_*` macro should be single identifiers. They are not macro expanded. (See Usage Example 3 below.)

The arguments to a `definition_docs` or `invocation_docs` macro can be arbitrary text and are macro expanded as usual, but keep in mind that the text is tokenized and then de-tokenized (as explained above), which limits the amount of control one has over whitespace in the final output.

Here is an example of how this works. Docstrings can be constructed as follows:

    %define cakery %(
        grab_invocation_docs(foodstuff)

        /** EXTRA BLAH BLAH */
        invocation_docs(CHEESE)
        /*** ADDITIONAL BLAH BLAH */
        probe cheese { ... }

        invocation_docs(BEER)
        probe beer { ... }
    %)

    /** BLAH BLAH BLAH DOCUMENTATION ABOUT %foodstuff */
    cakery

Note in particular that the three-star doc comment is to be used when extending a doc comment *in the original source text* (not in the output). The macros `invocation_docs` and `definition_docs` are smart about whether or not they occur immediately after a docstring (if they occur after a docstring, their output is glued together with that docstring).

Hence, this example produces output similar to the following:

    /** EXTRA BLAH BLAH
     * BLAH BLAH BLAH DOCUMENTATION ABOUT CHEESE */
    probe cheese { ... }

    /** BLAH BLAH BLAH DOCUMENTATION ABOUT BEER */
    probe beer { ... }

## Preprocessor Conditionals

    %( ... %? ... %: ... %)

These work identically to preprocessor conditionals in the current stap. They are not expanded within docstrings.

## Command Line Arguments

    $1, $2, $3, ..., $#
    @1, @2, @3, ..., @#

These work identically to command line arguments in the current stap. They are in fact already handled for us at the lexer level, being a text-based feature. This is the natural choice given the way the overall parser is structured, but it does in fact permit some spectacularly odd abuses, e.g.

    $ stap -e 'probe %( kernel_v > "3.0" $1 begin %: end %) { println("foo") }' '%?'

In docstring mode, command line arguments are simply not expanded.

# Usage Example 1 -- equivalent to a standard cpp #define macro

    %define AREA(base,height) ((base)*(height)/2.0)
    a = AREA(2+2,5) // correctly handles precedence

# Usage Example 2 -- defining a shorthand for a cast operation

    %define FOO(ptr) @cast((ptr),"struct foo","/path/to/app:<sys/foo.h")
    bar = FOO(p)->bar
    baz = FOO(p)->baz

# Usage Example 3 -- defining multiple probes with associated docstrings

    /** ... doc comments common to all ip probes ... */
    %define make_ipprobes(probe_name, hook4_name, hook6_name) %(
        grab_invocation_docs(probe_name, ipprotocol_name) // note that probe_name is not expanded
        grab_definition_docs

        invocation_docs(ip,IP)
        definition_docs
        /*** ... doc comments specific to ip ... */
        probe netfilter.ip.probe_name = netfilter.ipv4.probe_name,
                netfilter.ipv6.probe_name { }

        invocation_docs(ipv4,IPv4)
        definition_docs
        /*** ... doc comments specific to ipv4 ... */
        probe netfilter.ipv4.probe_name
                = netfilter.pf("NFPROTO_IPV4").hook(hook4_name) {
          ... stuff specific to ipv4 ...
        }

        invocation_docs(ipv6,IPv6)
        definition_docs
        /*** ... doc comments specific to ipv6 ... */
        probe netfilter.ipv6.probe_name
              = netfilter.pf("NFPROTO_IPV6").hook(hook6_name) {
          ... stuff specific to ipv6 ...
        }
    %)

    /** probe netfilter.%probe_name.pre_routing - Called before an %ipprotocol_name packet is routed */
    make_ipprobes(pre_routing,"NF_INET_PRE_ROUTING","NF_IP6_PRE_ROUTING")

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (PR11207) Macroprocessor discussion
  2012-07-03 16:17             ` Serguei Makarov
@ 2012-07-05  4:50               ` Frank Ch. Eigler
  0 siblings, 0 replies; 9+ messages in thread
From: Frank Ch. Eigler @ 2012-07-05  4:50 UTC (permalink / raw)
  To: Serguei Makarov; +Cc: Josh Stone, systemtap

Hi -

On Tue, Jul 03, 2012 at 12:17:03PM -0400, Serguei Makarov wrote:
> [...]
> Macro definitions are always local to a single file; it is
> impossible to e.g. define a macro in a tapset and then use it in a
> systemtap script that includes the tapset.

This is a problem.  One of the expected uses of this mechanism was to
let tapsets define macros for use by other scripts, such as for performing
kernel-flavoured offset_of(), container_of() type operations.


> [...] Because the result of evaluating conditionals depends on the
> target system being compiled for, while documentation is supposed to
> be largely platform-independent, conditionals cannot occur inside
> docstrings.

(This may deserve no mechanism beyond a warning.)


>     /** ... doc comments common to all ip probes ... */
>     %define make_ipprobes(probe_name, hook4_name, hook6_name) %(
>         grab_invocation_docs(probe_name, ipprotocol_name) // note that probe_name is not expanded
>         grab_definition_docs
>  [...]

This grab_* docstrings magic stuff hasn't started appealing to me yet, FWIW.


- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (PR11207) Macroprocessor discussion
       [not found] <f967988c-d2f7-49a5-b29c-0201b125db42@zmail19.collab.prod.int.phx2.redhat.com>
@ 2012-07-05 15:54 ` Serguei Makarov
  0 siblings, 0 replies; 9+ messages in thread
From: Serguei Makarov @ 2012-07-05 15:54 UTC (permalink / raw)
  To: systemtap

Frank,

Thanks for taking the time to offer feedback. There's really one major issue that's blocking for the design, which I discuss below. (As for the docstring-manipulation stuff not being appealing -- it's all right, that won't be in the initial prototype anyway, and we can live without it if I can't think of a nicer way to get the preprocessor to do what I wanted from that feature...)

> This is a problem.  One of the expected uses of this mechanism was to
> let tapsets define macros for use by other scripts, such as for performing
> kernel-flavoured offset_of(), container_of() type operations.

Hm, based on my reading of the code I felt that implementing this makes for a thorny problem with the way tapset inclusion is currently arranged. From what I know, identifiers across files are resolved as follows:
- all of the tapset files are parsed independently of each other (see main.cxx:L) and collected in s.library_files (this implies that macroexpansion is performed on each one)
- during semantic analysis, we resolve identifiers by finding the files containing referenced identifiers and adding them to s.files, then checking which tapsets *those* import in a transitive closure sorta thing
- everything that ends up in the transitive closure is compiled into the final module

This is a chicken-and-egg problem, since tapsets obviously are allowed to build on one another, and to properly parse and macroexpand a file we need to know what macros are in other files, so we need to macroexpand those files, which requires knowing what macros are pulled in from other files for *that* file...

One hack I can think of for getting around the issue is to force tapset writers to explicitly put directives such as

%include_macros("other_tapset.stp")

if they want to use macros from that tapset, in order for stap to have the information to parse files in the correct order. (The most obvious solution for this requires the parser to be able to suspend parsing one file, go off and parse another file, and then return to the first one where it left off. This seems reasonably, but some double-checking is required to be sure that nothing in parse.cxx assumes files are handled one at a time...)

[There was originally a digression here about how much time is spent parsing the tapset files sequentially, but it's not that much time as a proportion of the script run, so the efficiency concerns I was thinking about are basically irrelevant. (They may be applicable to Pass 2, but that's a discussion for a different day.) And pass timings are automatically printed for verbose > 0. :-)]

The final script being compiled is exempted from the need to use explicit macro inclusion directives, since by that point there is no chicken-and-egg ambiguity.

(As a minor detail, we also have to flip the order of passes 1a and 1b. 1b parses the tapsets, while 1a parses the final script.)

That's not a very satisfying solution. However, unless I think of something else, our design space for this corner of the problem seems to be very limited...

- Serhei

(PS: Argh, I keep forgetting to use reply to all...)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-07-05 15:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <f1a29b1a-adce-401a-8222-f213c8233958@zmail19.collab.prod.int.phx2.redhat.com>
2012-06-12 15:39 ` (PR11207) Macroprocessor discussion Serguei Makarov
2012-06-14  1:15   ` Frank Ch. Eigler
2012-06-14 20:50     ` Serguei Makarov
2012-06-14 21:21       ` Josh Stone
2012-06-15 14:37         ` Serguei Makarov
2012-06-15 16:06           ` Frank Ch. Eigler
2012-07-03 16:17             ` Serguei Makarov
2012-07-05  4:50               ` Frank Ch. Eigler
     [not found] <f967988c-d2f7-49a5-b29c-0201b125db42@zmail19.collab.prod.int.phx2.redhat.com>
2012-07-05 15:54 ` Serguei Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).