[RFC] Proposal of marker implementation

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* [RFC] Proposal of marker implementation
@ 2006-08-09  5:33 Masami Hiramatsu
  2006-08-10  2:37 ` Frank Ch. Eigler
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Masami Hiramatsu @ 2006-08-09  5:33 UTC (permalink / raw)
  To: SystemTAP; +Cc: Yumiko Sugita, Satoshi Oshima, Hideo Aoki

Hi,

I'd like to suggest my marker idea which I spoke in OLS.
My idea is based on the "section" of elf binary and the djprobe.

Here is the concept code on i386 architecture.
---
#define __MARKER_NOP(name) \
        asm volatile ("771:\n\t" ASM_NOP6 "\n772:\n"            \
                      ".section .markers,\"a\"\n"               \
                      "  .align 4\n"                            \
                      "  .long 771b\n"            /* label */   \
                      "  .byte 772b-771b\n"       /* length */  \
                      "  .string \"" #name "\"\n" /* name */    \
                      ".previous\n"                             \
                      ::: "memory")

#define MARKER(n) __MARKER_NOP(n)
---

This code comes from the "alternative" macros in the asm/alternative.h.
We can extract this marker's information by using the readelf command as below;

---
$ readelf -x ".markers" marker.ko
Hex dump of section '.markers':
  0x00000000 00000010 00000074 696e6906 00000003 .....init.......
  0x00000010                       0074 69786506 .exit.
---

This section contains two markers;
mark address = 00000003
mark length  = 06
marker name  = 696e697400 = "init"

mark address = 00000010
mark length  = 06
marker name  = 6578697400 = "exit"

So, we can check it by using the "objdump" command.
---
$ objdump -d marker.ko
marker.ko:     file format elf32-i386

Disassembly of section .text:

00000000 <init_module>:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi <--- here is "init" marker
   9:	31 c0                	xor    %eax,%eax
   b:	5d                   	pop    %ebp
   c:	c3                   	ret

0000000d <cleanup_module>:
   d:	55                   	push   %ebp
   e:	89 e5                	mov    %esp,%ebp
  10:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi <--- here is "exit" marker
  16:	5d                   	pop    %ebp
  17:	c3                   	ret
---
Each marker has 6bytes NOP. I think we can replace it with a jump code
safely by using djprobe. Note: we can not replace it with a "call" code
because it will break some caller-save registers.

I think this marker has many advantages.
- Minimal overhead (don't touch any memory if it is deactivated)
- Multiple markers can have the same name (we can activate those at once)
- There are no additional marking symbols in the kernel.
- Easily extensible format (we can add some additional information in the section)
Also has some disadvantages.
- Architecture dependency (but the "section" itself doesn't depend on the arch)
- Need djprobe for safety (we can use kprobes until it goes fine)
Any comments are welcome.

By the way, flight recorder patches and STPtracer (re-implementation
of LKST on SystemTap) package are now under prior examination by
PR members.
So, I'll send it in the week after next.

Best regards,
-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Proposal of marker implementation
  2006-08-09  5:33 [RFC] Proposal of marker implementation Masami Hiramatsu
@ 2006-08-10  2:37 ` Frank Ch. Eigler
  2006-08-10  4:15 ` Nicholas Miell
  2006-08-31 17:52 ` Frank Ch. Eigler
  2 siblings, 0 replies; 8+ messages in thread
From: Frank Ch. Eigler @ 2006-08-10  2:37 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: SystemTAP, Yumiko Sugita, Satoshi Oshima, Hideo Aoki

Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:

> I'd like to suggest my marker idea which I spoke in OLS.
> My idea is based on the "section" of elf binary and the djprobe.

Clever.

> [...]  Each marker has 6bytes NOP. I think we can replace it with a
> jump code safely by using djprobe. [...]

If you don't use a call, is it still straightforward to eventually
return to the proper spot to resume execution?

> I think this marker has many advantages.
> - Minimal overhead (don't touch any memory if it is deactivated)

True.

> - Multiple markers can have the same name (we can activate those at once)

The same is true of the conditional-call markers.

> - There are no additional marking symbols in the kernel.

True, but this is not necessarily a good thing.  The conditional-call
markers keep kernel symbols so that no offline or debugging
information is needed to connect to them.

> - Easily extensible format (we can add some additional information in the section)

The same is true of the conditional-call markers.

> Also has some disadvantages.
> - Architecture dependency (but the "section" itself doesn't depend on the arch)

Indeed.

> - Need djprobe for safety (we can use kprobes until it goes fine)

Indeed.  Can djprobes do this kind of instruction replacement safely
for all kernel configurations (smp, preempt)?  I was under the
impression that there were unsolvable complications e.g. if the nops
span cache lines.

Another big disadvantage: no way to pass parameters.

- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Proposal of marker implementation
  2006-08-09  5:33 [RFC] Proposal of marker implementation Masami Hiramatsu
  2006-08-10  2:37 ` Frank Ch. Eigler
@ 2006-08-10  4:15 ` Nicholas Miell
  2006-08-21  1:58   ` Masami Hiramatsu
  2006-08-31 17:52 ` Frank Ch. Eigler
  2 siblings, 1 reply; 8+ messages in thread
From: Nicholas Miell @ 2006-08-10  4:15 UTC (permalink / raw)
  To: systemtap

On Wed, 09 Aug 2006 14:33:11 +0900, Masami Hiramatsu wrote:

> Each marker has 6bytes NOP. I think we can replace it with a jump code
> safely by using djprobe. Note: we can not replace it with a "call" code
> because it will break some caller-save registers.

You can use a call instruction if the function you're calling is an
assembly stub which saves caller-saved registers and then calls a C
function (passing the saved return address as a parameter so the probe
point can be located).

Alternately, you could do a normal C function call and then replace
the call instruction with NOPs as a post-processing step. This method lets
the C compiler save the caller-saved registers for you, and you can easily
pass parameters.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Proposal of marker implementation
  2006-08-10  4:15 ` Nicholas Miell
@ 2006-08-21  1:58   ` Masami Hiramatsu
  0 siblings, 0 replies; 8+ messages in thread
From: Masami Hiramatsu @ 2006-08-21  1:58 UTC (permalink / raw)
  To: Nicholas Miell; +Cc: fche, yumiko.sugita.yf, soshima, haoki, SystemTAP

Hi, Nicholas

Nicholas Miell wrote:
> On Wed, 09 Aug 2006 14:33:11 +0900, Masami Hiramatsu wrote:
> 
>> Each marker has 6bytes NOP. I think we can replace it with a jump code
>> safely by using djprobe. Note: we can not replace it with a "call" code
>> because it will break some caller-save registers.
> 
> You can use a call instruction if the function you're calling is an
> assembly stub which saves caller-saved registers and then calls a C
> function (passing the saved return address as a parameter so the probe
> point can be located).

It's a nice idea, if the systemtap can treat a modified pt_regs structure.
My jump based method can provide the pt_regs structure like as the kprobe.
Thus, the systemtap can treat it transparently.
But, once we use a call instruction, it changes the contents of the stack.
It means that a special handler is needed for the marker.
Anyway, I think that is a considerable and useful idea.

> Alternately, you could do a normal C function call and then replace
> the call instruction with NOPs as a post-processing step. This method lets
> the C compiler save the caller-saved registers for you, and you can easily
> pass parameters.

Exactly. But, in this method, we have to pay some costs to save caller-saved
registers if those markers are disabled.

Thanks

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Proposal of marker implementation
  2006-08-09  5:33 [RFC] Proposal of marker implementation Masami Hiramatsu
  2006-08-10  2:37 ` Frank Ch. Eigler
  2006-08-10  4:15 ` Nicholas Miell
@ 2006-08-31 17:52 ` Frank Ch. Eigler
  2006-09-01  2:55   ` Masami Hiramatsu
  2 siblings, 1 reply; 8+ messages in thread
From: Frank Ch. Eigler @ 2006-08-31 17:52 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: SystemTAP, Yumiko Sugita, Satoshi Oshima, Hideo Aoki

Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:

> I'd like to suggest my marker idea which I spoke in OLS.
> My idea is based on the "section" of elf binary and the djprobe.
> [...]

Can this approach could be made "pluggable" in the sense of
interchangeable with the other type at the call site?  Can you make a
version of these macros that adds reliable parameter passing?  Can you
outline a proof-of-concept of the probe that would use these hooks?
Is live activation/deactivation of the probes a problem on complex
hosts (smp / preempt)?

- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Proposal of marker implementation
  2006-08-31 17:52 ` Frank Ch. Eigler
@ 2006-09-01  2:55   ` Masami Hiramatsu
  2006-09-01 13:53     ` Frank Ch. Eigler
  0 siblings, 1 reply; 8+ messages in thread
From: Masami Hiramatsu @ 2006-09-01  2:55 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: SystemTAP, Yumiko Sugita, Satoshi Oshima, Hideo Aoki

Hi Frank,

I discussed this idea and your question in Hitachi, and I
decide to shelve it, because realizing this idea is not easy
on the other arch, and there are no proofs of big advantages.

I just minded the overhead of current approach which came from
accessing variables. And now I'd like to prioritize other things
which should be done, for example, flight-recorder,
kprobe-booster@other arch, integrated tracing scripts,
and non-marker-based djprobe.

Thanks,

Frank Ch. Eigler wrote:
> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
> 
>> I'd like to suggest my marker idea which I spoke in OLS.
>> My idea is based on the "section" of elf binary and the djprobe.
>> [...]
> 
> Can this approach could be made "pluggable" in the sense of
> interchangeable with the other type at the call site? Can you make a
> version of these macros that adds reliable parameter passing?  Can you
> outline a proof-of-concept of the probe that would use these hooks?
> Is live activation/deactivation of the probes a problem on complex
> hosts (smp / preempt)?
> 
> - FChE
> 
> 

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Proposal of marker implementation
  2006-09-01  2:55   ` Masami Hiramatsu
@ 2006-09-01 13:53     ` Frank Ch. Eigler
  0 siblings, 0 replies; 8+ messages in thread
From: Frank Ch. Eigler @ 2006-09-01 13:53 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: systemtap, Yumiko Sugita, Satoshi Oshima, Hideo Aoki

Hi -

> I discussed this idea and your question in Hitachi, and I
> decide to shelve it, because realizing this idea is not easy
> on the other arch, and there are no proofs of big advantages.
> 
> I just minded the overhead of current approach [...]

I understand.  Thank you for thinking of these other methods.
In time, we may include several of the ideas in the code.

- FChE

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC] Proposal of marker implementation
@ 2006-08-10 17:45 Chuck Ebbert
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Ebbert @ 2006-08-10 17:45 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Hideo Aoki, Satoshi Oshima, Yumiko Sugita, systemtap

In-Reply-To: <44D97397.2080005@hitachi.com>

On Wed, 09 Aug 2006 14:33:11 +0900, Masami Hiramatsu wrote:

> I'd like to suggest my marker idea which I spoke in OLS.
> My idea is based on the "section" of elf binary and the djprobe.
>
> Here is the concept code on i386 architecture.
> ---
> #define __MARKER_NOP(name) \
>         asm volatile ("771:\n\t" ASM_NOP6 "\n772:\n"            \
>                       ".section .markers,\"a\"\n"               \
>                       "  .align 4\n"                            \
>                       "  .long 771b\n"            /* label */   \
>                       "  .byte 772b-771b\n"       /* length */  \
>                       "  .string \"" #name "\"\n" /* name */    \
>                       ".previous\n"                             \
>                       ::: "memory")

Why do you clobber memory?

If you explicitly clobber "eax", "ecx", "edx" you can safely change your
no-op into a C function call, assuming no problems with preempt and/or
SMP synchronization.

Even if you have to use a 1-byte no-op and replace it with Int3 there
could be some advantages to your approach:
        a. No worry about caller-saved regs if the marker clobbers them.
        b. Don't need to save the replaced byte because it's always 0x90.
        c. No need to execute the replaced insn out-of-line since it's
           a no-op.
        d. Solves the problem of "first instruction in a function is
           part of a loop."

-- 
Chuck

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-09-01 13:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-09  5:33 [RFC] Proposal of marker implementation Masami Hiramatsu
2006-08-10  2:37 ` Frank Ch. Eigler
2006-08-10  4:15 ` Nicholas Miell
2006-08-21  1:58   ` Masami Hiramatsu
2006-08-31 17:52 ` Frank Ch. Eigler
2006-09-01  2:55   ` Masami Hiramatsu
2006-09-01 13:53     ` Frank Ch. Eigler
2006-08-10 17:45 Chuck Ebbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).