public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [RFC] simple dprobe like markers for the kernel
@ 2008-07-09 21:23 James Bottomley
  2008-07-10  3:40 ` Mathieu Desnoyers
  0 siblings, 1 reply; 10+ messages in thread
From: James Bottomley @ 2008-07-09 21:23 UTC (permalink / raw)
  To: linux-kernel, systemtap

I've been looking at using the existing in kernel markers for dtrace
named probing in systemtap.  What I find is that they're a bit
heavyweight when compared to what dtrace does (because of the way they
drop stubbable calling points).

This patch adds incredibly simple markers which are designed to be used
via kprobes.  All it does is add an extra section to the kernel (and
modules) which annotates the location in source file/line of the marker
and a description of the variables of interest.  Tools like systemtap
can then use the kernel dwarf2 debugging information to transform this
to a precise probe point that gives access to the named variables.

The beauty of this scheme is that it has zero cost in the unactivated
case (the extra section is discardable if you're not interested in the
information, and nothing is actually added into the routine being
marked).  The disadvantage is that it's really unusable for rolling your
own marker probes because it relies on the dwarf2 information to locate
the probe point for kprobes and unravel the local variables of interest,
so you need an external tool like systemtap to help you.

The scheme uses a printk format like string to describe the variables of
interest, so if those variables disappear, the compile breaks (even in
the unmarked case) which should help us keep the marked probe points
current.

For instance, this is what SCSI would look like with a probe point added
just before the command goes to the low level device

		trace_simple(queuecommand, "Command being queued %p Done function %p", cmd, scsi_done);
		rtn = host->hostt->queuecommand(cmd, scsi_done);
		trace_simple(queuecommand_return, "Command returning %p Return value %d", cmd, rtn);

Here you can see that each trace point describes two variables whose
values can be viewed at that point by the relevant tools.  The format
strings and variables can be used by a tool to perform dtrace -l like
functionality:

MODULE    FUNCTION         NAME                DESCRIPTION
scsi_mod  scsi_dispatch_io queuecommand        Command being queued $sdev; Done function $scsi_done
scsi_mod  scsi_dispatch_io queuecommand_return Command being queued $sdev; Return value $ret

So the trace points recommend to the user what variables to use and
briefly what they mean.

James

---

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index f054778..c0c38b8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -299,6 +299,8 @@
 		.debug_funcnames 0 : { *(.debug_funcnames) }		\
 		.debug_typenames 0 : { *(.debug_typenames) }		\
 		.debug_varnames  0 : { *(.debug_varnames) }		\
+		/* simple markers (depends on dwarf2 debugging info) */	\
+		__simple_marker (INFO) : { *(__simple_marker) }		\
 
 		/* Stabs debugging sections.  */
 #define STABS_DEBUG							\
diff --git a/include/linux/simple_marker.h b/include/linux/simple_marker.h
new file mode 100644
index 0000000..675f5b1
--- /dev/null
+++ b/include/linux/simple_marker.h
@@ -0,0 +1,41 @@
+#include <linux/stringify.h>
+
+/* To be used for string format validity checking with gcc */
+static inline void __printf(1, 2)
+__trace_simple_check_format(const char *fmt, ...)
+{
+}
+
+#ifdef CONFIG_DEBUG_INFO
+#define trace_simple(name, format, args...)				\
+	do {								\
+		static const char __simple_name_##name[]		\
+		__attribute__((section("__simple_marker")))		\
+		__attribute__((__used__))				\
+		= #name;						\
+		static const char __simple_file_##name[]		\
+		__attribute__((section("__simple_marker")))		\
+		__attribute__((__used__))				\
+		= __FILE__;						\
+		static const char __simple_line_##name[]		\
+		__attribute__((section("__simple_marker")))		\
+		__attribute__((__used__))				\
+		= __stringify(__LINE__);				\
+		static const char __simple_format_##name[]		\
+		__attribute__((section("__simple_marker")))		\
+		__attribute__((__used__))				\
+		= #format;						\
+		static const char __simple_args_##name[]		\
+		__attribute__((section("__simple_marker")))		\
+		__attribute__((__used__))				\
+		= #args;						\
+		if (0)							\
+			__trace_simple_check_format(format, ## args);	\
+	} while(0)
+#else
+#define trace_simple(name, format, args...)				\
+	do {								\
+		if (0)							\
+			__trace_simple_check_format(format, ## args);	\
+	 } while(0)
+#endif


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
  2008-07-09 21:23 [RFC] simple dprobe like markers for the kernel James Bottomley
@ 2008-07-10  3:40 ` Mathieu Desnoyers
  2008-07-10 13:55   ` James Bottomley
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Desnoyers @ 2008-07-10  3:40 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel, systemtap

* James Bottomley (James.Bottomley@HansenPartnership.com) wrote:
> I've been looking at using the existing in kernel markers for dtrace
> named probing in systemtap.  What I find is that they're a bit
> heavyweight when compared to what dtrace does (because of the way they
> drop stubbable calling points).
> 
> This patch adds incredibly simple markers which are designed to be used
> via kprobes.  All it does is add an extra section to the kernel (and
> modules) which annotates the location in source file/line of the marker
> and a description of the variables of interest.  Tools like systemtap
> can then use the kernel dwarf2 debugging information to transform this
> to a precise probe point that gives access to the named variables.
> 
> The beauty of this scheme is that it has zero cost in the unactivated
> case (the extra section is discardable if you're not interested in the
> information, and nothing is actually added into the routine being
> marked).  The disadvantage is that it's really unusable for rolling your
> own marker probes because it relies on the dwarf2 information to locate
> the probe point for kprobes and unravel the local variables of interest,
> so you need an external tool like systemtap to help you.
> 
> The scheme uses a printk format like string to describe the variables of
> interest, so if those variables disappear, the compile breaks (even in
> the unmarked case) which should help us keep the marked probe points
> current.
> 
> For instance, this is what SCSI would look like with a probe point added
> just before the command goes to the low level device
> 
> 		trace_simple(queuecommand, "Command being queued %p Done function %p", cmd, scsi_done);
> 		rtn = host->hostt->queuecommand(cmd, scsi_done);
> 		trace_simple(queuecommand_return, "Command returning %p Return value %d", cmd, rtn);
> 
> Here you can see that each trace point describes two variables whose
> values can be viewed at that point by the relevant tools.  The format
> strings and variables can be used by a tool to perform dtrace -l like
> functionality:
> 
> MODULE    FUNCTION         NAME                DESCRIPTION
> scsi_mod  scsi_dispatch_io queuecommand        Command being queued $sdev; Done function $scsi_done
> scsi_mod  scsi_dispatch_io queuecommand_return Command being queued $sdev; Return value $ret
> 
> So the trace points recommend to the user what variables to use and
> briefly what they mean.
> 
> James
> 

Hi James,

It's interesting to see this try at a stubless marker scheme. A few
things as you and Frank pointed out :
- It depends on an external tool to parse the dwarf info, so it cannot
  be used by in-kernel tracers such as ftrace.
- It does not require variable liveliness at the marker site : the
  compiler can freely optimize out the variable whenever it needs to.

Besides those core concerns, I went through your patch, a small detail
seems incorrect. Please see the comment below.

> ---
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index f054778..c0c38b8 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -299,6 +299,8 @@
>  		.debug_funcnames 0 : { *(.debug_funcnames) }		\
>  		.debug_typenames 0 : { *(.debug_typenames) }		\
>  		.debug_varnames  0 : { *(.debug_varnames) }		\
> +		/* simple markers (depends on dwarf2 debugging info) */	\
> +		__simple_marker (INFO) : { *(__simple_marker) }		\
>  
>  		/* Stabs debugging sections.  */
>  #define STABS_DEBUG							\
> diff --git a/include/linux/simple_marker.h b/include/linux/simple_marker.h
> new file mode 100644
> index 0000000..675f5b1
> --- /dev/null
> +++ b/include/linux/simple_marker.h
> @@ -0,0 +1,41 @@
> +#include <linux/stringify.h>
> +
> +/* To be used for string format validity checking with gcc */
> +static inline void __printf(1, 2)
> +__trace_simple_check_format(const char *fmt, ...)
> +{
> +}
> +
> +#ifdef CONFIG_DEBUG_INFO
> +#define trace_simple(name, format, args...)				\
> +	do {								\
> +		static const char __simple_name_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= #name;						\
> +		static const char __simple_file_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= __FILE__;						\
> +		static const char __simple_line_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= __stringify(__LINE__);				\
> +		static const char __simple_format_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= #format;						\
> +		static const char __simple_args_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= #args;						\

All those variables placed in the __simple_marker section are not
guaranteed to be placed nicely together. There should be a structure
containing pointers to name, file, line, format and args strings (all
together) in a special section, and then those strings could be emitted
in another section. Otherwise, the compiler can freely choose to
interleave different strings from various tracing statements.

Mathieu

> +		if (0)							\
> +			__trace_simple_check_format(format, ## args);	\
> +	} while(0)
> +#else
> +#define trace_simple(name, format, args...)				\
> +	do {								\
> +		if (0)							\
> +			__trace_simple_check_format(format, ## args);	\
> +	 } while(0)
> +#endif
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
  2008-07-10  3:40 ` Mathieu Desnoyers
@ 2008-07-10 13:55   ` James Bottomley
  0 siblings, 0 replies; 10+ messages in thread
From: James Bottomley @ 2008-07-10 13:55 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: linux-kernel, systemtap

On Wed, 2008-07-09 at 23:39 -0400, Mathieu Desnoyers wrote:
> * James Bottomley (James.Bottomley@HansenPartnership.com) wrote:
> > I've been looking at using the existing in kernel markers for dtrace
> > named probing in systemtap.  What I find is that they're a bit
> > heavyweight when compared to what dtrace does (because of the way they
> > drop stubbable calling points).
> > 
> > This patch adds incredibly simple markers which are designed to be used
> > via kprobes.  All it does is add an extra section to the kernel (and
> > modules) which annotates the location in source file/line of the marker
> > and a description of the variables of interest.  Tools like systemtap
> > can then use the kernel dwarf2 debugging information to transform this
> > to a precise probe point that gives access to the named variables.
> > 
> > The beauty of this scheme is that it has zero cost in the unactivated
> > case (the extra section is discardable if you're not interested in the
> > information, and nothing is actually added into the routine being
> > marked).  The disadvantage is that it's really unusable for rolling your
> > own marker probes because it relies on the dwarf2 information to locate
> > the probe point for kprobes and unravel the local variables of interest,
> > so you need an external tool like systemtap to help you.
> > 
> > The scheme uses a printk format like string to describe the variables of
> > interest, so if those variables disappear, the compile breaks (even in
> > the unmarked case) which should help us keep the marked probe points
> > current.
> > 
> > For instance, this is what SCSI would look like with a probe point added
> > just before the command goes to the low level device
> > 
> > 		trace_simple(queuecommand, "Command being queued %p Done function %p", cmd, scsi_done);
> > 		rtn = host->hostt->queuecommand(cmd, scsi_done);
> > 		trace_simple(queuecommand_return, "Command returning %p Return value %d", cmd, rtn);
> > 
> > Here you can see that each trace point describes two variables whose
> > values can be viewed at that point by the relevant tools.  The format
> > strings and variables can be used by a tool to perform dtrace -l like
> > functionality:
> > 
> > MODULE    FUNCTION         NAME                DESCRIPTION
> > scsi_mod  scsi_dispatch_io queuecommand        Command being queued $sdev; Done function $scsi_done
> > scsi_mod  scsi_dispatch_io queuecommand_return Command being queued $sdev; Return value $ret
> > 
> > So the trace points recommend to the user what variables to use and
> > briefly what they mean.
> > 
> > James
> > 
> 
> Hi James,
> 
> It's interesting to see this try at a stubless marker scheme. A few
> things as you and Frank pointed out :
> - It depends on an external tool to parse the dwarf info, so it cannot
>   be used by in-kernel tracers such as ftrace.

Actually, I think I listed that as one of the issues.

> - It does not require variable liveliness at the marker site : the
>   compiler can freely optimize out the variable whenever it needs to.

Correct ... by design a zero impact probe must not affect the
optimisation.

> Besides those core concerns, I went through your patch, a small detail
> seems incorrect. Please see the comment below.
> 
> > ---
> > 
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index f054778..c0c38b8 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -299,6 +299,8 @@
> >  		.debug_funcnames 0 : { *(.debug_funcnames) }		\
> >  		.debug_typenames 0 : { *(.debug_typenames) }		\
> >  		.debug_varnames  0 : { *(.debug_varnames) }		\
> > +		/* simple markers (depends on dwarf2 debugging info) */	\
> > +		__simple_marker (INFO) : { *(__simple_marker) }		\
> >  
> >  		/* Stabs debugging sections.  */
> >  #define STABS_DEBUG							\
> > diff --git a/include/linux/simple_marker.h b/include/linux/simple_marker.h
> > new file mode 100644
> > index 0000000..675f5b1
> > --- /dev/null
> > +++ b/include/linux/simple_marker.h
> > @@ -0,0 +1,41 @@
> > +#include <linux/stringify.h>
> > +
> > +/* To be used for string format validity checking with gcc */
> > +static inline void __printf(1, 2)
> > +__trace_simple_check_format(const char *fmt, ...)
> > +{
> > +}
> > +
> > +#ifdef CONFIG_DEBUG_INFO
> > +#define trace_simple(name, format, args...)				\
> > +	do {								\
> > +		static const char __simple_name_##name[]		\
> > +		__attribute__((section("__simple_marker")))		\
> > +		__attribute__((__used__))				\
> > +		= #name;						\
> > +		static const char __simple_file_##name[]		\
> > +		__attribute__((section("__simple_marker")))		\
> > +		__attribute__((__used__))				\
> > +		= __FILE__;						\
> > +		static const char __simple_line_##name[]		\
> > +		__attribute__((section("__simple_marker")))		\
> > +		__attribute__((__used__))				\
> > +		= __stringify(__LINE__);				\
> > +		static const char __simple_format_##name[]		\
> > +		__attribute__((section("__simple_marker")))		\
> > +		__attribute__((__used__))				\
> > +		= #format;						\
> > +		static const char __simple_args_##name[]		\
> > +		__attribute__((section("__simple_marker")))		\
> > +		__attribute__((__used__))				\
> > +		= #args;						\
> 
> All those variables placed in the __simple_marker section are not
> guaranteed to be placed nicely together. There should be a structure
> containing pointers to name, file, line, format and args strings (all
> together) in a special section, and then those strings could be emitted
> in another section. Otherwise, the compiler can freely choose to
> interleave different strings from various tracing statements.

This is just an RFC and a proof of concept.  In practice the whole
section will have to be constructed so that it's versioned (in case more
information has to be added).  Realistically, string optimisation also
needs to be done (since file will be repeated over and over again),
which is also less easy to do.  However, the idea of using strings is
deliberate since they're easier to parse than specific structures which
have to be shared between tools (particularly when the information is
variable length).

James


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
  2008-07-10 15:30         ` Theodore Tso
  2008-07-10 15:57           ` James Bottomley
@ 2008-07-10 18:20           ` Frank Ch. Eigler
  1 sibling, 0 replies; 10+ messages in thread
From: Frank Ch. Eigler @ 2008-07-10 18:20 UTC (permalink / raw)
  To: Theodore Tso, James Bottomley, linux-kernel, systemtap

Hi -

On Thu, Jul 10, 2008 at 11:30:17AM -0400, Theodore Tso wrote:

> [...]  When you said a tool could determine if the tracepoint had
> gotten optimized away, or the variables were no longer present, I
> assume you meant at compile time, right?  So with the right tool
> built into the kbuild infrastructure, if we could simply print
> warnings when tracepoints had gotten optimized away [...]

It will be interesting to see how frequently such a warning appears
for a good suite of such mini markers, on a diversity of architectures
and compilers.  Such data can help pronounce judgement on this approach.


> P.S.  When you said that the current kernel markers are "a bit
> heavyweight", how bad are they in practice?  Hundreds of cycles?  More?

Good question.  The only performance measurements I have seen posted
indicate negligible effects.


- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
  2008-07-10 15:30         ` Theodore Tso
@ 2008-07-10 15:57           ` James Bottomley
  2008-07-10 18:20           ` Frank Ch. Eigler
  1 sibling, 0 replies; 10+ messages in thread
From: James Bottomley @ 2008-07-10 15:57 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, linux-kernel, systemtap

On Thu, 2008-07-10 at 11:30 -0400, Theodore Tso wrote:
> On Thu, Jul 10, 2008 at 09:43:16AM -0500, James Bottomley wrote:
> > No ... I'm used to optimisation strangeness.  Again, I'm not trying to
> > eliminate it because that would defeat the zero impact purpose.  I'm
> > trying to build a system that can be useful without any impact.  The
> > consequence is going to be that certain trace points can't be used
> > because of the optimiser, but that's the tradeoff.  As long as the
> > people placing the trace points are subject matter experts in the
> > subsystem (and actually using them) everything should be OK.
> 
> So as I understand things, your light-weight tracepoints are designed
> for very performance-sensitive code paths where we don't want to
> disturbe the optimization in the deactivated state.  In
> non-performance sensitive parts of the kernel, where cycle counting is
> not so important, tracepoints can and probably should still be used.
> So I don't think you were proposing eliminating the current kernel
> markers in favor of this approach, yes?

That's right ... I started from the position that the current markers
were too heavy for an I/O subsystem, but I'm sure they have many other
uses.

> When you said a tool could determine if the tracepoint had gotten
> optimized away, or the variables were no longer present, I assume you
> meant at compile time, right?

Yes and no.  Yes because a tool will be able to detect the problems, but
no if you're thinking an actual kernel compile would do it (unless some
tool is designed for this and integrated into the build ... the obvious
tool is systemtap, but that might cause some heartburn).

>   So with the right tool built into the
> kbuild infrastructure, if we could simply print warnings when
> tracepoints had gotten optimized away, that would make the your simple
> tracepoints quite safe for general use, I would think.

Yes ... but someone has to come up with the tool.  I suppose rebuilding
the line number matrix and finding the variables at the location is easy
mechanical dwarf stuff ... but it will give the kernel build a lot of
external dependencies it didn't have before.

Plus, this level of checking can only be done if dwarf is generated
(i.e. CONFIG_KERNEL_DEBUG_INFO is y).

James


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
  2008-07-10 14:46       ` James Bottomley
@ 2008-07-10 15:30         ` Theodore Tso
  2008-07-10 15:57           ` James Bottomley
  2008-07-10 18:20           ` Frank Ch. Eigler
  0 siblings, 2 replies; 10+ messages in thread
From: Theodore Tso @ 2008-07-10 15:30 UTC (permalink / raw)
  To: James Bottomley; +Cc: Frank Ch. Eigler, linux-kernel, systemtap

On Thu, Jul 10, 2008 at 09:43:16AM -0500, James Bottomley wrote:
> No ... I'm used to optimisation strangeness.  Again, I'm not trying to
> eliminate it because that would defeat the zero impact purpose.  I'm
> trying to build a system that can be useful without any impact.  The
> consequence is going to be that certain trace points can't be used
> because of the optimiser, but that's the tradeoff.  As long as the
> people placing the trace points are subject matter experts in the
> subsystem (and actually using them) everything should be OK.

So as I understand things, your light-weight tracepoints are designed
for very performance-sensitive code paths where we don't want to
disturbe the optimization in the deactivated state.  In
non-performance sensitive parts of the kernel, where cycle counting is
not so important, tracepoints can and probably should still be used.
So I don't think you were proposing eliminating the current kernel
markers in favor of this approach, yes?

When you said a tool could determine if the tracepoint had gotten
optimized away, or the variables were no longer present, I assume you
meant at compile time, right?  So with the right tool built into the
kbuild infrastructure, if we could simply print warnings when
tracepoints had gotten optimized away, that would make the your simple
tracepoints quite safe for general use, I would think.

	    	       	   	   	  	- Ted

P.S.  When you said that the current kernel markers are "a bit
heavyweight", how bad are they in practice?  Hundreds of cycles?  More?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
  2008-07-10 14:23     ` Frank Ch. Eigler
@ 2008-07-10 14:46       ` James Bottomley
  2008-07-10 15:30         ` Theodore Tso
  0 siblings, 1 reply; 10+ messages in thread
From: James Bottomley @ 2008-07-10 14:46 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: linux-kernel, systemtap

On Thu, 2008-07-10 at 10:22 -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> On Thu, Jul 10, 2008 at 08:49:54AM -0500, James Bottomley wrote:
> > [...]
> > > Another disadvantage is one that came up earlier when markers were
> > > initially thought up: that something so invisible to the compiler (no
> > > code being generated in the instruction stream, after optimization,
> > > may be impossible to locate: not just the statement but also the
> > > putative parameters.
> > 
> > Actually, I listed that one as an advantage.  But, in order to be
> > completely zero impact, the probe cannot interfere with optimisation,
> > and so you run the risk of having the probe point do strange things
> > (like it's in the middle of a loop that gets unrolled) or that the
> > variables you want to advertise get optimised away.
> > 
> > All of this is mitigated by correct selection of the probe points and
> > the variables.
> 
> Well, you can test your theory: replace some "tracepoints" or markers
> or printk's with this, and see if systemtap (or gdb) can get at the
> same data.

That's what I'm actually already doing ... so far it works nicely.

> When "correct selection" is a function of any particular compiler's
> optimization algorithms, it will be difficult for a human programmer
> to get it right.

Not necessarily.  A tracepoint by a barrier will always be pretty much
OK, as will variables that are either passed in or passed to functions
(since they have to be instantiated to pass as arguments).

Plus screw ups are easily detectable by a tool that parses the dwarf.

The essential point is that we need zero impact trace points and that
makes them difficult to place in this fashion.  However, the burden of
placing and verifying them rests with the people in the actual subsystem
(who are also the ones who hopefully get the most use out of them).

> > > Long ago, someone proposed inserting an asm("nop") mini-markers into
> > > the instruction stream, which could then be used as an anchor to tie a
> > > kprobe to, so that would solve the statement-location problem.
> > 
> > But you don't need a nop ... you just need a line number.
> 
> That's *if* the line number ends up being resolvable back to a PC.  In
> fact, since there is no code emitted for it, that particular line
> number will not actually appear in DWARF line records.

Erm, no ... dwarf is designed to emit an entry for every line in the
file (whether it contains a statment or not).  The empty lines get
elided in the line number program (because you can attach them to the
first statement following) but a correct parser will recover them (by
design in the dwarf).

> > > But it doesn't help assure that the parameters will be available in
> > > dwarf, so someone else proposed adding another asm that just asks the
> > > parameters to be evaluated and placed *somewhere*.  Each asm input
> > > constraint was to be the loosest possible, so as to not force the
> > > compiler to put the values into registers (and evict their normal
> > > tracing-ignorant tenants).
> > 
> > Actually, it does.  Assuming the probe is placed in the code by someone
> > who knows what they're doing and is using it, you can ensure that what
> > you're advertising actually exists.  [...]
> 
> You misunderstood - I am not talking about whether the variables exist
> in the context of the source code.  The question is which of those
> variables still exist, live & addressable, in the machine code and
> execution state.  You may be surprised to what extent compiler
> optimizations disrupt a simple source-level reading of the situation.

No ... I'm used to optimisation strangeness.  Again, I'm not trying to
eliminate it because that would defeat the zero impact purpose.  I'm
trying to build a system that can be useful without any impact.  The
consequence is going to be that certain trace points can't be used
because of the optimiser, but that's the tradeoff.  As long as the
people placing the trace points are subject matter experts in the
subsystem (and actually using them) everything should be OK.

> > > So that's roughly how we arrived at recent markers.  They expose to
> > > the compiler the parameters, but arrange not to evaluate them unless
> > > necessary.  The most recent markers code patches nops over most or all
> > > the hot path instructions, so there is no tangible performance impact.
> > 
> > Yes there are.  There are actually two performance impacts:
> > 
> >      1. The nops themselves take cycles to execute ... small, granted,
> >         but it adds up with lots of probe points
> >      2. The probes interfere with optimisation since to replace them
> >         with a function call, they must be barriers. [...]
> 
> That's why I qualified it with "tangible".  Please confirm your
> intuition about these costs.

1 is pretty obvious ... the nops have a defined cycle time in every
instruction architecture.  The optimisation costs are very difficult to
quantify since they vary so much from compiler to compiler and function
to function.

James


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
  2008-07-10 13:51   ` James Bottomley
@ 2008-07-10 14:23     ` Frank Ch. Eigler
  2008-07-10 14:46       ` James Bottomley
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Ch. Eigler @ 2008-07-10 14:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel, systemtap

Hi -

On Thu, Jul 10, 2008 at 08:49:54AM -0500, James Bottomley wrote:
> [...]
> > Another disadvantage is one that came up earlier when markers were
> > initially thought up: that something so invisible to the compiler (no
> > code being generated in the instruction stream, after optimization,
> > may be impossible to locate: not just the statement but also the
> > putative parameters.
> 
> Actually, I listed that one as an advantage.  But, in order to be
> completely zero impact, the probe cannot interfere with optimisation,
> and so you run the risk of having the probe point do strange things
> (like it's in the middle of a loop that gets unrolled) or that the
> variables you want to advertise get optimised away.
> 
> All of this is mitigated by correct selection of the probe points and
> the variables.

Well, you can test your theory: replace some "tracepoints" or markers
or printk's with this, and see if systemtap (or gdb) can get at the
same data.

When "correct selection" is a function of any particular compiler's
optimization algorithms, it will be difficult for a human programmer
to get it right.


> > Long ago, someone proposed inserting an asm("nop") mini-markers into
> > the instruction stream, which could then be used as an anchor to tie a
> > kprobe to, so that would solve the statement-location problem.
> 
> But you don't need a nop ... you just need a line number.

That's *if* the line number ends up being resolvable back to a PC.  In
fact, since there is no code emitted for it, that particular line
number will not actually appear in DWARF line records.


> > But it doesn't help assure that the parameters will be available in
> > dwarf, so someone else proposed adding another asm that just asks the
> > parameters to be evaluated and placed *somewhere*.  Each asm input
> > constraint was to be the loosest possible, so as to not force the
> > compiler to put the values into registers (and evict their normal
> > tracing-ignorant tenants).
> 
> Actually, it does.  Assuming the probe is placed in the code by someone
> who knows what they're doing and is using it, you can ensure that what
> you're advertising actually exists.  [...]

You misunderstood - I am not talking about whether the variables exist
in the context of the source code.  The question is which of those
variables still exist, live & addressable, in the machine code and
execution state.  You may be surprised to what extent compiler
optimizations disrupt a simple source-level reading of the situation.
 

> > So that's roughly how we arrived at recent markers.  They expose to
> > the compiler the parameters, but arrange not to evaluate them unless
> > necessary.  The most recent markers code patches nops over most or all
> > the hot path instructions, so there is no tangible performance impact.
> 
> Yes there are.  There are actually two performance impacts:
> 
>      1. The nops themselves take cycles to execute ... small, granted,
>         but it adds up with lots of probe points
>      2. The probes interfere with optimisation since to replace them
>         with a function call, they must be barriers. [...]

That's why I qualified it with "tangible".  Please confirm your
intuition about these costs.


- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
  2008-07-10  2:30 ` Frank Ch. Eigler
@ 2008-07-10 13:51   ` James Bottomley
  2008-07-10 14:23     ` Frank Ch. Eigler
  0 siblings, 1 reply; 10+ messages in thread
From: James Bottomley @ 2008-07-10 13:51 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: linux-kernel, systemtap

On Wed, 2008-07-09 at 22:29 -0400, Frank Ch. Eigler wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> > I've been looking at using the existing in kernel markers for dtrace
> > named probing in systemtap.  What I find is that they're a bit
> > heavyweight when compared to what dtrace does (because of the way
> > they drop stubbable calling points).
> 
> > This patch adds incredibly simple markers which are designed to be used
> > via kprobes [+ dwarf].  [...]
> 
> > [...]  The disadvantage is that it's really unusable for rolling
> > your own marker probes because it relies on the dwarf2 information
> > to locate the probe point for kprobes and unravel the local
> > variables of interest, so you need an external tool like systemtap
> > to help you. [...]
> 
> Another disadvantage is one that came up earlier when markers were
> initially thought up: that something so invisible to the compiler (no
> code being generated in the instruction stream, after optimization,
> may be impossible to locate: not just the statement but also the
> putative parameters.

Actually, I listed that one as an advantage.  But, in order to be
completely zero impact, the probe cannot interfere with optimisation,
and so you run the risk of having the probe point do strange things
(like it's in the middle of a loop that gets unrolled) or that the
variables you want to advertise get optimised away.

All of this is mitigated by correct selection of the probe points and
the variables.

> Long ago, someone proposed inserting an asm("nop") mini-markers into
> the instruction stream, which could then be used as an anchor to tie a
> kprobe to, so that would solve the statement-location problem.

But you don't need a nop ... you just need a line number.

> But it doesn't help assure that the parameters will be available in
> dwarf, so someone else proposed adding another asm that just asks the
> parameters to be evaluated and placed *somewhere*.  Each asm input
> constraint was to be the loosest possible, so as to not force the
> compiler to put the values into registers (and evict their normal
> tracing-ignorant tenants).

Actually, it does.  Assuming the probe is placed in the code by someone
who knows what they're doing and is using it, you can ensure that what
you're advertising actually exists.  If you look at the SCSI example I
gave, both the probe points and the variables actually exist, and will
continue to exist because of what they are and where they're placed.

> I believe this combination was never actually built/tested, partly
> because people realized that then the compiler would always have to
> evaluate parameters unconditionally, whether or not a kprobe is
> present.  (To do it otherwise would IIRC require the asm code to
> include control-flow-modification instructions, which would surprise
> gcc.)
> 
> So that's roughly how we arrived at recent markers.  They expose to
> the compiler the parameters, but arrange not to evaluate them unless
> necessary.  The most recent markers code patches nops over most or all
> the hot path instructions, so there is no tangible performance impact.

Yes there are.  There are actually two performance impacts:

     1. The nops themselves take cycles to execute ... small, granted,
        but it adds up with lots of probe points
     2. The probes interfere with optimisation since to replace them
        with a function call, they must be barriers.

I didn't say use simple probes to replace markers ... I just said it's
an alternative for things like I/O subsystems that don't want the
perturbation.

James


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] simple dprobe like markers for the kernel
       [not found] <1215638551.3444.39.camel__22002.9595810503$1215638656$gmane$org@localhost.localdomain>
@ 2008-07-10  2:30 ` Frank Ch. Eigler
  2008-07-10 13:51   ` James Bottomley
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Ch. Eigler @ 2008-07-10  2:30 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel, systemtap

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> I've been looking at using the existing in kernel markers for dtrace
> named probing in systemtap.  What I find is that they're a bit
> heavyweight when compared to what dtrace does (because of the way
> they drop stubbable calling points).

> This patch adds incredibly simple markers which are designed to be used
> via kprobes [+ dwarf].  [...]

> [...]  The disadvantage is that it's really unusable for rolling
> your own marker probes because it relies on the dwarf2 information
> to locate the probe point for kprobes and unravel the local
> variables of interest, so you need an external tool like systemtap
> to help you. [...]

Another disadvantage is one that came up earlier when markers were
initially thought up: that something so invisible to the compiler (no
code being generated in the instruction stream, after optimization,
may be impossible to locate: not just the statement but also the
putative parameters.

Long ago, someone proposed inserting an asm("nop") mini-markers into
the instruction stream, which could then be used as an anchor to tie a
kprobe to, so that would solve the statement-location problem.

But it doesn't help assure that the parameters will be available in
dwarf, so someone else proposed adding another asm that just asks the
parameters to be evaluated and placed *somewhere*.  Each asm input
constraint was to be the loosest possible, so as to not force the
compiler to put the values into registers (and evict their normal
tracing-ignorant tenants).

I believe this combination was never actually built/tested, partly
because people realized that then the compiler would always have to
evaluate parameters unconditionally, whether or not a kprobe is
present.  (To do it otherwise would IIRC require the asm code to
include control-flow-modification instructions, which would surprise
gcc.)

So that's roughly how we arrived at recent markers.  They expose to
the compiler the parameters, but arrange not to evaluate them unless
necessary.  The most recent markers code patches nops over most or all
the hot path instructions, so there is no tangible performance impact.


- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-07-10 18:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-09 21:23 [RFC] simple dprobe like markers for the kernel James Bottomley
2008-07-10  3:40 ` Mathieu Desnoyers
2008-07-10 13:55   ` James Bottomley
     [not found] <1215638551.3444.39.camel__22002.9595810503$1215638656$gmane$org@localhost.localdomain>
2008-07-10  2:30 ` Frank Ch. Eigler
2008-07-10 13:51   ` James Bottomley
2008-07-10 14:23     ` Frank Ch. Eigler
2008-07-10 14:46       ` James Bottomley
2008-07-10 15:30         ` Theodore Tso
2008-07-10 15:57           ` James Bottomley
2008-07-10 18:20           ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).