RE: Hitachi djprobe mechanism

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* RE: Hitachi djprobe mechanism
@ 2005-07-27 21:05 Keshavamurthy, Anil S
  2005-07-28  1:51 ` Karim Yaghmour
  0 siblings, 1 reply; 83+ messages in thread
From: Keshavamurthy, Anil S @ 2005-07-27 21:05 UTC (permalink / raw)
  To: Masami Hiramatsu, Roland McGrath
  Cc: Richard J Moore, SystemTAP, sugita, Satoshi Oshima

Hi Masami,
	The same paper you have mentioned below talks 
about overwriting a single instruction at the instrumentation
point as opposed to what djprobe is doing which is
replacing multiple instruction( in order to overwrite
5 byte jmp instruction).

Having to replace multiple instructions in order to
insert a long jump instruction is a very dangerous thing
as some processes on some cpu might have been preempted
out in the middle of those instructions and are expected
to continue from the middle of that instruction which is now
a data for overwritten jump instruction.

I think that overwriting just a single-instruction
is always hazard-free and should be followed in djprobe. 
The paper clearly explains how to achieve this using what
is known as springboard technique.

Please let me know your thoughts on this.

-thanks,
Anil
 

>-----Original Message-----
>From: systemtap-owner@sources.redhat.com 
>[mailto:systemtap-owner@sources.redhat.com] On Behalf Of 
>Masami Hiramatsu
>Sent: Wednesday, July 27, 2005 6:02 AM
>To: Roland McGrath
>Cc: Richard J Moore; SystemTAP; sugita@sdl.hitachi.co.jp; 
>Satoshi Oshima
>Subject: Re: Hitachi djprobe mechanism
>
>Hi, Roland
>
>Roland McGrath wrote:
>>>  I think Kerninst is similar in effect to djprobe. both of them copy
>>>original code to a buffer and jump to the buffer.
>>>  However I think that the most unique feature of djprobe is use of
>>>"bypass" route to safely insert code on SMP.
>>>  I cannot find SMP safety mechanism like "bypass" in kerninst papers
>>>yet.
>> 
>> 
>> If by this you mean inserting an int3 while writing the rest 
>of the jmp
>> instruction and then overwriting the first byte when the 
>rest is in place,
>> I recall reading about that in some kerninst paper to be sure.
>
>Thanks a lot.
>Finally, I found it in page.9 of the OSDI paper:
>"Fine-Grained Dynamic Instrumentation of Commodity Operating 
>System Kernels",
>Ariel Tamches and Barton P. Miller, OSDI, Feb 1999.
>
>Actually, it seems to describe a similar thing.
>
>-- 
>Masami HIRAMATSU
>2nd Research Dept.
>Hitachi, Ltd., Systems Development Laboratory
>E-mail: hiramatu@sdl.hitachi.co.jp
>
>
>
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-27 21:05 Hitachi djprobe mechanism Keshavamurthy, Anil S
@ 2005-07-28  1:51 ` Karim Yaghmour
  2005-07-28  2:10   ` Karim Yaghmour
  0 siblings, 1 reply; 83+ messages in thread
From: Karim Yaghmour @ 2005-07-28  1:51 UTC (permalink / raw)
  To: Keshavamurthy, Anil S
  Cc: Masami Hiramatsu, Roland McGrath, Richard J Moore, SystemTAP,
	sugita, Satoshi Oshima

Keshavamurthy, Anil S wrote:
> I think that overwriting just a single-instruction
> is always hazard-free and should be followed in djprobe. 
> The paper clearly explains how to achieve this using what
> is known as springboard technique.

From the article's text:
"The springboard approach requires chunks of scratch space (collectively,
the springboard heap) to be conveniently sprinkled throughout the kernel,
so that every kernel instruction can reach some chunk when using one of
the suitable instructions ..."

The text goes on to explain that kerninst hijacks the loadable module
functionality and uses the initialization/finalization functions' address
space to achieve its goals. However, the article kind of glosses over the
implications of this. This seems like a very racy thing to do, and
certainly makes the loading/unloading process kind of problematic. Not
to mention that it won't work with kernels that have no modules to
start with, or for which the only modules loaded are used at boot time to
mount the rootfs.

So unless there's some other way to create/obtain a sprinboard heap,
this too seems limited.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-28  1:51 ` Karim Yaghmour
@ 2005-07-28  2:10   ` Karim Yaghmour
  2005-07-28 16:23     ` Masami Hiramatsu
  0 siblings, 1 reply; 83+ messages in thread
From: Karim Yaghmour @ 2005-07-28  2:10 UTC (permalink / raw)
  To: Keshavamurthy, Anil S
  Cc: Masami Hiramatsu, Roland McGrath, Richard J Moore, SystemTAP,
	sugita, Satoshi Oshima


Karim Yaghmour wrote:
From the article's text:
> "The springboard approach requires chunks of scratch space (collectively,
> the springboard heap) to be conveniently sprinkled throughout the kernel,
> so that every kernel instruction can reach some chunk when using one of
> the suitable instructions ..."

Also, there's this bit I missed from the figure the text refers to as
containing the list of instructions that can be used for various architectures
(figure 4.6):

"None of the architectures has an ideal splicing instruction; either
displacement is insufficient (RISC architectures), or there is no
guarantee that only a single instruction is overwritten when splicing (x86)."

To the best of my understanding, the latter seems to imply that springboards
have the very same limitations mentioned earlier for djprobe.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-28  2:10   ` Karim Yaghmour
@ 2005-07-28 16:23     ` Masami Hiramatsu
  2005-07-28 16:28       ` Karim Yaghmour
  2005-07-28 18:13       ` Richard J Moore
  0 siblings, 2 replies; 83+ messages in thread
From: Masami Hiramatsu @ 2005-07-28 16:23 UTC (permalink / raw)
  To: karim
  Cc: Keshavamurthy, Anil S, Masami Hiramatsu, Roland McGrath,
	Richard J Moore, SystemTAP, sugita, Satoshi Oshima

Hi,

2005/7/28, Karim Yaghmour <karim@opersys.com>:
> Karim Yaghmour wrote:
> >From the article's text:
> > "The springboard approach requires chunks of scratch space (collectively,
> > the springboard heap) to be conveniently sprinkled throughout the kernel,
> > so that every kernel instruction can reach some chunk when using one of
> > the suitable instructions ..."
>
> Also, there's this bit I missed from the figure the text refers to as
> containing the list of instructions that can be used for various architectures
> (figure 4.6):
>
> "None of the architectures has an ideal splicing instruction; either
> displacement is insufficient (RISC architectures), or there is no
> guarantee that only a single instruction is overwritten when splicing (x86)."
>
> To the best of my understanding, the latter seems to imply that springboards
> have the very same limitations mentioned earlier for djprobe.

I think so. the size of smallest jmp instruction is 2 bytes on i386,
but the smallest instruction is 1byte on i386 (ex. pushl %esi).
I will try to add safety check routine in sched() and do_IRQ().

--
Masami Hiramatsu
mailto:masami.hiramatsu@gmail.com

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-28 16:23     ` Masami Hiramatsu
@ 2005-07-28 16:28       ` Karim Yaghmour
  2005-07-28 17:36         ` Mathieu Desnoyers
  2005-07-28 18:13       ` Richard J Moore
  1 sibling, 1 reply; 83+ messages in thread
From: Karim Yaghmour @ 2005-07-28 16:28 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Keshavamurthy, Anil S, Masami Hiramatsu, Roland McGrath,
	Richard J Moore, SystemTAP, sugita, Satoshi Oshima


Masami Hiramatsu wrote:
> I think so. the size of smallest jmp instruction is 2 bytes on i386,
> but the smallest instruction is 1byte on i386 (ex. pushl %esi).
> I will try to add safety check routine in sched() and do_IRQ().

I'm sorry, I'm probably missing something. What will the checks in
sched() and do_IRQ() do to avoid problems?

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-28 16:28       ` Karim Yaghmour
@ 2005-07-28 17:36         ` Mathieu Desnoyers
       [not found]           ` <20050728110717.A30199@unix-os.sc.intel.com>
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-07-28 17:36 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Masami Hiramatsu, Keshavamurthy, Anil S, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, SystemTAP, sugita,
	Satoshi Oshima, michel.dagenais

* Karim Yaghmour (karim@opersys.com) wrote:
> 
> Masami Hiramatsu wrote:
> > I think so. the size of smallest jmp instruction is 2 bytes on i386,
> > but the smallest instruction is 1byte on i386 (ex. pushl %esi).
> > I will try to add safety check routine in sched() and do_IRQ().
> 
> I'm sorry, I'm probably missing something. What will the checks in
> sched() and do_IRQ() do to avoid problems?
> 

I suggest this approach :

* Using a landing zone for the probe initially filled with something like :

(for a 5 bytes jmp instruction)

local_irq_save ("pushfl ; popl %0 ; cli")
nop
nop
nop
nop
local_irq_restore ("pushl %0 ; popfl")

It will protect from interruptions (therefore preemption) in the landing zone on
every CPU.

* Let's see what the code alteration function could do :

You may then change the cli instruction from the local_irq_save for an int3
instruction. You then simply check that no other CPU has interrupts disabled
(use an IPI). You are then sure that no other CPU is in the zone you want to
modify (or has an address falling in this zone as a return address from an
interrupt).

The only problem you will have is if an NMI comes in at the middle of the nops.
But hey! The processors won't answer to your low priority IPI until the NMI
handler has finished and interrupts are reenabled.

Once you known that your zone

int3
nop
nop
nop
nop

is protected, you just have to change those 5 bytes for your jmp (make sure that
int3 is the last one to be changed).

* Then int3 handler could simply return to the exact spot at the pushl %0.

The downside of this approach is that it needs a marker in the code and has a
small impact on a system performance when it is not traced.

Any comments ?

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-28 16:23     ` Masami Hiramatsu
  2005-07-28 16:28       ` Karim Yaghmour
@ 2005-07-28 18:13       ` Richard J Moore
  1 sibling, 0 replies; 83+ messages in thread
From: Richard J Moore @ 2005-07-28 18:13 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Keshavamurthy, Anil S, Masami Hiramatsu, karim, Roland McGrath,
	Satoshi Oshima, sugita, SystemTAP





Masami Hiramatsu <masami.hiramatsu@gmail.com> wrote on 28/07/2005 17:22:46:

> Hi,
>
> 2005/7/28, Karim Yaghmour <karim@opersys.com>:
> > Karim Yaghmour wrote:
> > >From the article's text:
> > > "The springboard approach requires chunks of scratch space
(collectively,
> > > the springboard heap) to be conveniently sprinkled throughout the
kernel,
> > > so that every kernel instruction can reach some chunk when using one
of
> > > the suitable instructions ..."
> >
> > Also, there's this bit I missed from the figure the text refers to as
> > containing the list of instructions that can be used for various
> architectures
> > (figure 4.6):
> >
> > "None of the architectures has an ideal splicing instruction; either
> > displacement is insufficient (RISC architectures), or there is no
> > guarantee that only a single instruction is overwritten when
> splicing (x86)."
> >
> > To the best of my understanding, the latter seems to imply that
springboards
> > have the very same limitations mentioned earlier for djprobe.
>
> I think so. the size of smallest jmp instruction is 2 bytes on i386,
> but the smallest instruction is 1byte on i386 (ex. pushl %esi).
> I will try to add safety check routine in sched() and do_IRQ().

That's why "int3" is not the same as "int 3" i.e. it's one byte instead of
two. Nothing else quite works as well for brekpointing purposes.

>
> --
> Masami Hiramatsu
> mailto:masami.hiramatsu@gmail.com


- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
       [not found]           ` <20050728110717.A30199@unix-os.sc.intel.com>
@ 2005-07-28 18:33             ` Mathieu Desnoyers
       [not found]               ` <20050728133456.A32210@unix-os.sc.intel.com>
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-07-28 18:33 UTC (permalink / raw)
  To: Keshavamurthy Anil S
  Cc: Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, SystemTAP, sugita,
	Satoshi Oshima, michel.dagenais

* Keshavamurthy Anil S (anil.s.keshavamurthy@intel.com) wrote:
> Your suggestion has several limitations, the moment you replace cli with int3, 
> some thread on some cpu might hit this int3 which needs to be handled 
> and not only that, after this is handled this thread might starts to execute
> nop instruction which is right after int3 and at which point in time this 
> thread might be prempted. Again we end up in the same situation as before.
> 
> Also checking that no other CPU has interrupt disabled using an IPI involves
> tight spining in the IPI callback handler which is prone to deadlock.
> 

The goal of this int3 is indeed that, if another thread on another CPU falls
into it, the int3 handler will change its own return address to fall into the
push instruction, right after the nops.

For the IPI :

- the on_each_cpu should be called from a context where no spinlock is held. The
  context from which it is called could be protected by a flag which would cause
  any concurrent processor to fail if they try to follow the same "code
  alteration" function.
- the IPI handler does not need to spin at all. It just returns. It only has to
  inform us about when interrupt disabled zone has ended. As this zone will not
  be reentered due to the int3 instruction, everything is fine.

Or did I miss something ?


Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
       [not found]               ` <20050728133456.A32210@unix-os.sc.intel.com>
@ 2005-07-28 23:53                 ` Richard J Moore
  2005-07-29  5:59                 ` Mathieu Desnoyers
  1 sibling, 0 replies; 83+ messages in thread
From: Richard J Moore @ 2005-07-28 23:53 UTC (permalink / raw)
  To: Keshavamurthy Anil S
  Cc: Keshavamurthy Anil S, Mathieu Desnoyers, Masami Hiramatsu,
	Karim Yaghmour, Masami Hiramatsu, michel.dagenais,
	Roland McGrath, Satoshi Oshima, sugita, systemtap





There are more efficient ways of implementing a jmp type hook - see the
kernel hooks package, where we evloved past this string of 5 no-ops
implementation Here we moved an immediate value - 1 byte - into a reg and
jumped on the reg being non-zero. To spring the hook we stored the one
immediate byte in the mov instruction. This technique works quite well on
IA64 where one can use a predicate register for the same purpose.
- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072


                                                                           
             Keshavamurthy                                                 
             Anil S                                                        
             <anil.s.keshava                                            To 
             murthy@intel.co         Mathieu Desnoyers                     
             m>                      <compudj@krystal.dyndns.org>          
                                                                        cc 
             28/07/2005              Keshavamurthy Anil S                  
             21:34                   <anil.s.keshavamurthy@intel.com>,     
                                     Karim Yaghmour <karim@opersys.com>,   
                                     Masami Hiramatsu                      
             Please respond          <masami.hiramatsu@gmail.com>, Masami  
                   to                Hiramatsu                             
              Keshavamurthy          <hiramatu@sdl.hitachi.co.jp>, Roland  
                 Anil S              McGrath <roland@redhat.com>, Richard  
                                     J Moore/UK/IBM@IBMGB,                 
                                     systemtap@sources.redhat.com,         
                                     sugita@sdl.hitachi.co.jp, Satoshi     
                                     Oshima <soshima@redhat.com>,          
                                     michel.dagenais@polymtl.ca            
                                                                       bcc 
                                                                           
                                                                   Subject 
                                     Re: Hitachi djprobe mechanism         
                                                                           
                                                                           




On Thu, Jul 28, 2005 at 02:32:36PM -0400, Mathieu Desnoyers wrote:
> The goal of this int3 is indeed that, if another thread on another CPU
falls
> into it, the int3 handler will change its own return address to fall into
the
> push instruction, right after the nops.
Ah..I see you point now.

>
> For the IPI :
>
> - the on_each_cpu should be called from a context where no spinlock is
held. The
>   context from which it is called could be protected by a flag which
would cause
>   any concurrent processor to fail if they try to follow the same "code
>   alteration" function.
> - the IPI handler does not need to spin at all. It just returns. It only
has to
>   inform us about when interrupt disabled zone has ended. As this zone
will not
>   be reentered due to the int3 instruction, everything is fine.
If we are not spinning in IPI then I am fine.


>
> Or did I miss something ?

However I have one more issues to mentioni(for discussion sake), when we
have several nop
instructions(may be 4 nop instruction) in a landing zone, we have no
guarantee that
all of these nop instructions are with in a page boundary. This is very
important
because when you replace these instruction with a 5 byte jump instruction,
this
5 byte jump instruction should be with in a page boundary else the
processor
will generate unaligned instruction access voilation or it might
generate page fault while trying to execute an instruction which is a very
bad thing.
How are we going to address this issues?

Thanks,
Anil


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
       [not found]               ` <20050728133456.A32210@unix-os.sc.intel.com>
  2005-07-28 23:53                 ` Richard J Moore
@ 2005-07-29  5:59                 ` Mathieu Desnoyers
  2005-07-29  7:55                   ` Andi Kleen
  2005-07-29 16:06                   ` Frank Ch. Eigler
  1 sibling, 2 replies; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-07-29  5:59 UTC (permalink / raw)
  To: Keshavamurthy Anil S
  Cc: Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

Well, it seems that it has all been thought about before us, thanks to Richard
Moore.

* Keshavamurthy Anil S (anil.s.keshavamurthy@intel.com) wrote:

> However I have one more issues to mentioni(for discussion sake), when we have several nop 
> instructions(may be 4 nop instruction) in a landing zone, we have no guarantee that 
> all of these nop instructions are with in a page boundary. This is very important
> because when you replace these instruction with a 5 byte jump instruction, this
> 5 byte jump instruction should be with in a page boundary else the processor
> will generate unaligned instruction access voilation

Well, a "push current interrupt register value", followed by ".align 8,0x90" 
and then followed by cli, nop nop nop nop should do the trick. The far jmp is
aligned on 8 bytes memory address, right ?

> or it might
> generate page fault while trying to execute an instruction which is a very bad thing.

In the kernel, I would be very surprised to see that. In fact, even module.c,
which loads kernel code in virtual memory, only keeps it temporarily in this
location. It is put in kmalloc'd memory before the code is actually running.
Anyways, from IA-32 documentation, faults handler are called prior to execute
the instruction. It shouldn't be any different from having to call an unaligned
instruction fault handler and then a page fault handler from this first handler.
Well, all this looks ugly anyway, no wonder why they do not keep kernel code in
virtual memory.

And as the jmp instruction is 5 bytes, there seems to be no hope to find an
atomic operation that will write that.

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  5:59                 ` Mathieu Desnoyers
@ 2005-07-29  7:55                   ` Andi Kleen
  2005-07-29  8:44                     ` Richard J Moore
  2005-07-29 15:51                     ` Mathieu Desnoyers
  2005-07-29 16:06                   ` Frank Ch. Eigler
  1 sibling, 2 replies; 83+ messages in thread
From: Andi Kleen @ 2005-07-29  7:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
> 
> And as the jmp instruction is 5 bytes, there seems to be no hope to find an
> atomic operation that will write that.

Any 64bit architecture can write 8 bytes mostly atomically (at least towards
readers) and many 32bit architectures (like newer x86 with cmpxchg or sse) 
can too.

An 8 byte read-modify-store is not protected against multiple writers,
but that is no problem for probes which can protect against that
with a different lock.

x86 could actually do it atomically even for writers with cmpxchg8.

-Andi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  7:55                   ` Andi Kleen
@ 2005-07-29  8:44                     ` Richard J Moore
  2005-07-29  8:46                       ` Andi Kleen
  2005-07-29 15:51                     ` Mathieu Desnoyers
  1 sibling, 1 reply; 83+ messages in thread
From: Richard J Moore @ 2005-07-29  8:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: ak, Mathieu Desnoyers, Masami Hiramatsu, Karim Yaghmour,
	Masami Hiramatsu, michel.dagenais, Roland McGrath,
	Satoshi Oshima, sugita, systemtap





that's a very good point. cmpxchg is not always considered for atomic
storing, though one does have to handle the complication of crossing page
boundaries.
- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072


                                                                           
             Andi Kleen                                                    
             <ak@suse.de>                                                  
             Sent by:                                                   To 
             ak@suse.de              Mathieu Desnoyers                     
                                     <compudj@krystal.dyndns.org>          
                                                                        cc 
             29/07/2005              Karim Yaghmour <karim@opersys.com>,   
             08:54                   Masami Hiramatsu                      
                                     <masami.hiramatsu@gmail.com>, Masami  
                                     Hiramatsu                             
                                     <hiramatu@sdl.hitachi.co.jp>, Roland  
                                     McGrath <roland@redhat.com>, Richard  
                                     J Moore/UK/IBM@IBMGB,                 
                                     systemtap@sources.redhat.com,         
                                     sugita@sdl.hitachi.co.jp, Satoshi     
                                     Oshima <soshima@redhat.com>,          
                                     michel.dagenais@polymtl.ca            
                                                                       bcc 
                                                                           
                                                                   Subject 
                                     Re: Hitachi djprobe mechanism         
                                                                           
                                                                           




Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
>
> And as the jmp instruction is 5 bytes, there seems to be no hope to find
an
> atomic operation that will write that.

Any 64bit architecture can write 8 bytes mostly atomically (at least
towards
readers) and many 32bit architectures (like newer x86 with cmpxchg or sse)
can too.

An 8 byte read-modify-store is not protected against multiple writers,
but that is no problem for probes which can protect against that
with a different lock.

x86 could actually do it atomically even for writers with cmpxchg8.

-Andi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  8:44                     ` Richard J Moore
@ 2005-07-29  8:46                       ` Andi Kleen
  0 siblings, 0 replies; 83+ messages in thread
From: Andi Kleen @ 2005-07-29  8:46 UTC (permalink / raw)
  To: Richard J Moore
  Cc: Andi Kleen, Mathieu Desnoyers, Masami Hiramatsu, Karim Yaghmour,
	Masami Hiramatsu, michel.dagenais, Roland McGrath,
	Satoshi Oshima, sugita, systemtap

On Fri, Jul 29, 2005 at 09:39:03AM +0100, Richard J Moore wrote:
> 
> 
> 
> 
> that's a very good point. cmpxchg is not always considered for atomic
> storing, though one does have to handle the complication of crossing page
> boundaries.

The CPU cancels and restarts the instruction in this case and everything is 
atomic again.

-Andi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  7:55                   ` Andi Kleen
  2005-07-29  8:44                     ` Richard J Moore
@ 2005-07-29 15:51                     ` Mathieu Desnoyers
  2005-07-30 15:55                       ` Andi Kleen
  1 sibling, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-07-29 15:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

* Andi Kleen (ak@suse.de) wrote:
> Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
> > 
> > And as the jmp instruction is 5 bytes, there seems to be no hope to find an
> > atomic operation that will write that.
> 
> Any 64bit architecture can write 8 bytes mostly atomically (at least towards
> readers) and many 32bit architectures (like newer x86 with cmpxchg or sse) 
> can too.
> 
> An 8 byte read-modify-store is not protected against multiple writers,
> but that is no problem for probes which can protect against that
> with a different lock.
> 
> x86 could actually do it atomically even for writers with cmpxchg8.
> 
> -Andi
> 

It's probably a small bit of understanding of cmpxch I am missing here, but what
would happen if :

CPU A                                 CPU B
read first byte of jmp instruction

                                      lock; cmpxch of the 2 bytes of the jmp
                                      instruction.
                                      
read the second byte of jmp
instruction

As I see it, the write in memory is atomic, but not the instruction fetching. In
that case, the reader would see an inconsistent last jmp address byte.

cmpxch seems ok if the reader does read the 64 bits more than once to check if
it has changed behind it. Or is there a special lock prefix to a 64 bits read
operation I am not aware of ? (on 32 bits arch)


Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  5:59                 ` Mathieu Desnoyers
  2005-07-29  7:55                   ` Andi Kleen
@ 2005-07-29 16:06                   ` Frank Ch. Eigler
  2005-07-29 18:24                     ` sugita
  1 sibling, 1 reply; 83+ messages in thread
From: Frank Ch. Eigler @ 2005-07-29 16:06 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: systemtap

Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:

> [...]  And as the jmp instruction is 5 bytes, there seems to be no
> hope to find an atomic operation that will write that.

There are other ways to skin the cat.  Remember, the speed of
inserting/removing these probes is not that important.  We may be
willing to pay extraordinary conservative synchronization costs like
IPI, cache flushing, and so on, as long as during *execution* time,
the probes are fast.

- FChE

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29 16:06                   ` Frank Ch. Eigler
@ 2005-07-29 18:24                     ` sugita
  0 siblings, 0 replies; 83+ messages in thread
From: sugita @ 2005-07-29 18:24 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Mathieu Desnoyers, systemtap

Hi,

 I'm a member of the djprobe team.

"Frank Ch. Eigler" wrote:

> Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
>
> > [...]  And as the jmp instruction is 5 bytes, there seems to be no
> > hope to find an atomic operation that will write that.
>
> There are other ways to skin the cat.  Remember, the speed of
> inserting/removing these probes is not that important.  We may be
> willing to pay extraordinary conservative synchronization costs like
> IPI, cache flushing, and so on, as long as during *execution* time,
> the probes are fast.
>
> - FChE

You're right.
To reduce the cost during execution time is our main purpose.

We hope that first we will develop the dynamic probe (djprobe)
and then discuss the static probe (marker).

Best regards,
Yumiko

*-*-*-*-*-*-
Yumiko Sugita
Hitachi, Ltd., Systems Development Laboratory
Email : sugita@sdl.hitachi.co.jp
                             　　　　　  -*-*-*-*-*-*

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29 15:51                     ` Mathieu Desnoyers
@ 2005-07-30 15:55                       ` Andi Kleen
  2005-07-30 16:54                         ` Mathieu Desnoyers
  0 siblings, 1 reply; 83+ messages in thread
From: Andi Kleen @ 2005-07-30 15:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

> As I see it, the write in memory is atomic, but not the instruction fetching. In
> that case, the reader would see an inconsistent last jmp address byte.

Yes, you're right. cmpxchg only helps when the replaced instruction
is >= the new instruction. For smaller instructions only a IPI to
stop all CPUs works.

Actually there may be tricks possible to first int3 (or equivalent single
byte replacement on other archs) the second instruction,
then the first, then wait for a RCU period of all CPUs to quiescence and then
write the longer jump. But an IPI is probably easier because it doesn't need 
a full disassembler for this and setting probes should not be performance
critical.

-Andi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-30 15:55                       ` Andi Kleen
@ 2005-07-30 16:54                         ` Mathieu Desnoyers
  2005-07-31 22:03                           ` Andi Kleen
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-07-30 16:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

* Andi Kleen (ak@suse.de) wrote:
> > As I see it, the write in memory is atomic, but not the instruction fetching. In
> > that case, the reader would see an inconsistent last jmp address byte.
> 
> Yes, you're right. cmpxchg only helps when the replaced instruction
> is >= the new instruction. For smaller instructions only a IPI to
> stop all CPUs works.
> 

It was not exactly the point of my comment. If we try to overwrite an existing
instruction, without any marker, two cases may show up :

* the instruction to replace is >= the jmp instruction (5 bytes)

It has been suggested that using cmpxchg8 would solve this problem. cmpxchg8
does indeed commit 8 bytes of data to memory atomically, even on 32 bits
architectures.

My question is related to the instruction we want to replace : how is it read by
the CPU ? If it's 5 bytes in size, il has to be read in two chunks by the cpu in
a 32 bits arch. Does the CPU lock the memory bus between those two read ?

If not, we still have a problem : the second read might be inconsistent with the
first one, even of the cmpxchg8 has been done atomically.

* the instruction to replace is < the jmp instruction (4 bytes or less)

If our goal is to overwrite code which has not been surrounded by a marker, an
IPI wouldn't save us here. The marker is necessary in order to disable
interruptions and make the IPI meaningful.

> Actually there may be tricks possible to first int3 (or equivalent single
> byte replacement on other archs) the second instruction,
> then the first, then wait for a RCU period of all CPUs to quiescence and then
> write the longer jump. But an IPI is probably easier because it doesn't need 
> a full disassembler for this and setting probes should not be performance
> critical.
>

Well, in fact, there is still a problem. (on no, not again!) ;)  The RCU does
require the reader to disable preemption, otherwise there is no guarantee that
they won't be scheduled out in the middle of the critical section, and the RCU
does only guarantee that a non schedulable reader will have finished by the time
the RCU period is over.

How do you plan to disable unvolountary preemption around the instructions you
want to overwrite ?

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-30 16:54                         ` Mathieu Desnoyers
@ 2005-07-31 22:03                           ` Andi Kleen
  2005-07-31 23:11                             ` Mathieu Desnoyers
  2005-08-01  8:44                             ` Richard J Moore
  0 siblings, 2 replies; 83+ messages in thread
From: Andi Kleen @ 2005-07-31 22:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

On Sat, Jul 30, 2005 at 12:47:47PM -0400, Mathieu Desnoyers wrote:
> * Andi Kleen (ak@suse.de) wrote:
> > > As I see it, the write in memory is atomic, but not the instruction fetching. In
> > > that case, the reader would see an inconsistent last jmp address byte.
> > 
> > Yes, you're right. cmpxchg only helps when the replaced instruction
> > is >= the new instruction. For smaller instructions only a IPI to
> > stop all CPUs works.
> > 
> 
> It was not exactly the point of my comment. If we try to overwrite an existing
> instruction, without any marker, two cases may show up :
> 
> * the instruction to replace is >= the jmp instruction (5 bytes)
> 
> It has been suggested that using cmpxchg8 would solve this problem. cmpxchg8
> does indeed commit 8 bytes of data to memory atomically, even on 32 bits
> architectures.
> 
> My question is related to the instruction we want to replace : how is it read by
> the CPU ? If it's 5 bytes in size, il has to be read in two chunks by the cpu in
> a 32 bits arch. Does the CPU lock the memory bus between those two read ?

32bit ISA has nothing to do how the CPU fetches instructions
("32bit" x86s usually have a much wider memory interface)

In general these things are done on cache lines between 32 and 128 bytes
depending on the CPU. Of course cache lines can be crossed by instructions, but the 
CPU should handle that atomically. 

However is no guarantee afaik for that in the architecture though so you cannot 
really rely on it. If let's say the 386 had this behaviour then it is probably
safe to assume later x86s implement it too for compatibility (modulo bugs)

In practice it's more complicated. The CPU fetches the instruction 
some time before actually executing it into its pipeline, and then sniffs 
the bus for any modifications of it and then cancels and reexecutes the 
instruction if needed.

However when you look at CPU errata sheets you will find quite a lot
of bugs in this area, so I would not really rely on frequent patching for
production.

I think just using the IPI is much simpler and easier.


> * the instruction to replace is < the jmp instruction (4 bytes or less)
> 
> If our goal is to overwrite code which has not been surrounded by a marker, an
> IPI wouldn't save us here. The marker is necessary in order to disable
> interruptions and make the IPI meaningful.

You lost me here.


> 
> 
> > Actually there may be tricks possible to first int3 (or equivalent single
> > byte replacement on other archs) the second instruction,
> > then the first, then wait for a RCU period of all CPUs to quiescence and then
> > write the longer jump. But an IPI is probably easier because it doesn't need 
> > a full disassembler for this and setting probes should not be performance
> > critical.
> >
> 
> Well, in fact, there is still a problem. (on no, not again!) ;)  The RCU does
> require the reader to disable preemption, otherwise there is no guarantee that
> they won't be scheduled out in the middle of the critical section, and the RCU
> does only guarantee that a non schedulable reader will have finished by the time
> the RCU period is over.
> 
> How do you plan to disable unvolountary preemption around the instructions you
> want to overwrite ?


One way would be to just search the task list for any tasks blocked with an IP
inside the patched region. If yes rewait for another quiescent period.

-Andi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-31 22:03                           ` Andi Kleen
@ 2005-07-31 23:11                             ` Mathieu Desnoyers
  2005-08-01 15:37                               ` Andi Kleen
  2005-08-01  8:44                             ` Richard J Moore
  1 sibling, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-07-31 23:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

* Andi Kleen (ak@suse.de) wrote:
> 
> One way would be to just search the task list for any tasks blocked with an IP
> inside the patched region. If yes rewait for another quiescent period.
> 
> 

If you stop other cpus'scheduler when you do that, then it's ok.

I just though about an interesting way to implement the IPI, which would work
very well (and safely) for any case where the instruction to overwrite is >= 5
bytes. The idea :

- Send IPI to each other cpu
  IP args : * address we plan to write to
            * the new instruction we plan to write
  (The IPI handler could then make an infinite loop, reading the address,
  waiting for it to contain the new instruction.)
- As we are sure that no other CPU is executing this code, we just have to write
  it in memory.

It doesn't work for smaller instructions (problem if the code jump in an invalid
instruction, if interrupt returns there or if preemption returns there). Searching
the list of tasks for an IP at this position would correct the preemption problem,
but not jmp to nor interruption.


Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-31 22:03                           ` Andi Kleen
  2005-07-31 23:11                             ` Mathieu Desnoyers
@ 2005-08-01  8:44                             ` Richard J Moore
  2005-08-01 13:21                               ` Mathieu Desnoyers
  2005-08-01 19:57                               ` Satoshi Oshima
  1 sibling, 2 replies; 83+ messages in thread
From: Richard J Moore @ 2005-08-01  8:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, Mathieu Desnoyers, Masami Hiramatsu, Karim Yaghmour,
	Masami Hiramatsu, michel.dagenais, Roland McGrath,
	Satoshi Oshima, sugita, systemtap

There is another issue to consider when looking into using probes other
then int3:

Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
practice of modifying code on one processor where another has prefetched
the unmodified version of the code. Intel states that unpredictable general
protection faults may result if a synchronizing instruction (iret, int,
int3, cpuid, etc ) is not executed on the second processor before it
executes the pre-fetched out-of-date copy of the instruction.

When we became aware of this I had a long discussion with Intel's
microarchitecture guys. It turns out that the reason for this erratum
(which incidentally Intel does not intend to fix) is because the trace
cache - the stream of micorops resulting from instruction interpretation -
cannot guaranteed to be valid. Reading between the lines I assume this
issue arises because of optimization done in the trace cache, where it is
no longer possible to identify the original instruction boundaries. If the
CPU discoverers that the trace cache has been invalidated because of
unsynchronized cross-modification then instruction execution will be
aborted with a GPF. Further discussion with Intel revealed that replacing
the first opcode byte with an int3 would not be subject to this erratum.

So, is cmpxchg reliable? One has to guarantee more than mere atomicity.

- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072

             Andi Kleen                                                    
             <ak@suse.de>                                                  
                                                                        To 
             31/07/2005              Mathieu Desnoyers                     
             23:03                   <compudj@krystal.dyndns.org>          
                                                                        cc 
                                     Andi Kleen <ak@suse.de>, Karim        
                                     Yaghmour <karim@opersys.com>, Masami  
                                     Hiramatsu                             
                                     <masami.hiramatsu@gmail.com>, Masami  
                                     Hiramatsu                             
                                     <hiramatu@sdl.hitachi.co.jp>, Roland  
                                     McGrath <roland@redhat.com>, Richard  
                                     J Moore/UK/IBM@IBMGB,                 
                                     systemtap@sources.redhat.com,         
                                     sugita@sdl.hitachi.co.jp, Satoshi     
                                     Oshima <soshima@redhat.com>,          
                                     michel.dagenais@polymtl.ca            
                                                                       bcc 

                                                                   Subject 
                                     Re: Hitachi djprobe mechanism         

On Sat, Jul 30, 2005 at 12:47:47PM -0400, Mathieu Desnoyers wrote:
> * Andi Kleen (ak@suse.de) wrote:
> > > As I see it, the write in memory is atomic, but not the instruction
fetching. In
> > > that case, the reader would see an inconsistent last jmp address
byte.
> >
> > Yes, you're right. cmpxchg only helps when the replaced instruction
> > is >= the new instruction. For smaller instructions only a IPI to
> > stop all CPUs works.
> >
>
> It was not exactly the point of my comment. If we try to overwrite an
existing
> instruction, without any marker, two cases may show up :
>
> * the instruction to replace is >= the jmp instruction (5 bytes)
>
> It has been suggested that using cmpxchg8 would solve this problem.
cmpxchg8
> does indeed commit 8 bytes of data to memory atomically, even on 32 bits
> architectures.
>
> My question is related to the instruction we want to replace : how is it
read by
> the CPU ? If it's 5 bytes in size, il has to be read in two chunks by the
cpu in
> a 32 bits arch. Does the CPU lock the memory bus between those two read ?

32bit ISA has nothing to do how the CPU fetches instructions
("32bit" x86s usually have a much wider memory interface)

In general these things are done on cache lines between 32 and 128 bytes
depending on the CPU. Of course cache lines can be crossed by instructions,
but the
CPU should handle that atomically.

However is no guarantee afaik for that in the architecture though so you
cannot
really rely on it. If let's say the 386 had this behaviour then it is
probably
safe to assume later x86s implement it too for compatibility (modulo bugs)

In practice it's more complicated. The CPU fetches the instruction
some time before actually executing it into its pipeline, and then sniffs
the bus for any modifications of it and then cancels and reexecutes the
instruction if needed.

However when you look at CPU errata sheets you will find quite a lot
of bugs in this area, so I would not really rely on frequent patching for
production.

I think just using the IPI is much simpler and easier.

> * the instruction to replace is < the jmp instruction (4 bytes or less)
>
> If our goal is to overwrite code which has not been surrounded by a
marker, an
> IPI wouldn't save us here. The marker is necessary in order to disable
> interruptions and make the IPI meaningful.

You lost me here.

>
>
> > Actually there may be tricks possible to first int3 (or equivalent
single
> > byte replacement on other archs) the second instruction,
> > then the first, then wait for a RCU period of all CPUs to quiescence
and then
> > write the longer jump. But an IPI is probably easier because it doesn't
need
> > a full disassembler for this and setting probes should not be
performance
> > critical.
> >
>
> Well, in fact, there is still a problem. (on no, not again!) ;)  The RCU
does
> require the reader to disable preemption, otherwise there is no guarantee
that
> they won't be scheduled out in the middle of the critical section, and
the RCU
> does only guarantee that a non schedulable reader will have finished by
the time
> the RCU period is over.
>
> How do you plan to disable unvolountary preemption around the
instructions you
> want to overwrite ?

One way would be to just search the task list for any tasks blocked with an
IP
inside the patched region. If yes rewait for another quiescent period.

-Andi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01  8:44                             ` Richard J Moore
@ 2005-08-01 13:21                               ` Mathieu Desnoyers
  2005-08-01 19:57                               ` Satoshi Oshima
  1 sibling, 0 replies; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-08-01 13:21 UTC (permalink / raw)
  To: Richard J Moore
  Cc: Andi Kleen, Masami Hiramatsu, Karim Yaghmour, Masami Hiramatsu,
	michel.dagenais, Roland McGrath, Satoshi Oshima, sugita,
	systemtap

* Richard J Moore (richardj_moore@uk.ibm.com) wrote:
> 
> Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
> practice of modifying code on one processor where another has prefetched
> the unmodified version of the code. Intel states that unpredictable general
> protection faults may result if a synchronizing instruction (iret, int,
> int3, cpuid, etc ) is not executed on the second processor before it
> executes the pre-fetched out-of-date copy of the instruction.
> 

Well, using an IPI that would make all other CPUs loop waiting for the specific
memory address to have been written with the expected new instructions seems to
fit it this description too : they will all have to return from interrupt before
going back to the modified code path.


[...]
> So, is cmpxchg reliable? One has to guarantee more than mere atomicity.
> 

Thanks for pointing that out : it gives the technical explanation of something I
only suspected.


Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-31 23:11                             ` Mathieu Desnoyers
@ 2005-08-01 15:37                               ` Andi Kleen
  0 siblings, 0 replies; 83+ messages in thread
From: Andi Kleen @ 2005-08-01 15:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

On Sun, Jul 31, 2005 at 06:59:41PM -0400, Mathieu Desnoyers wrote:
> * Andi Kleen (ak@suse.de) wrote:
> > 
> > One way would be to just search the task list for any tasks blocked with an IP
> > inside the patched region. If yes rewait for another quiescent period.
> > 
> > 
> 
> If you stop other cpus'scheduler when you do that, then it's ok.

You don't need to stop them, a snapshot of the task list is enough
since you only care about preempted sleeping processes at a single 
point of time.

Anyways, this discussion is theoretic because the IPI approach
is probably better.

> 
> I just though about an interesting way to implement the IPI, which would work
> very well (and safely) for any case where the instruction to overwrite is >= 5
> bytes. The idea :
> 
> - Send IPI to each other cpu
>   IP args : * address we plan to write to
>             * the new instruction we plan to write
>   (The IPI handler could then make an infinite loop, reading the address,
>   waiting for it to contain the new instruction.)

Seems far too complicated, just make it spin on a lock during the modification.


-Andi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01  8:44                             ` Richard J Moore
  2005-08-01 13:21                               ` Mathieu Desnoyers
@ 2005-08-01 19:57                               ` Satoshi Oshima
  2005-08-01 20:21                                 ` Karim Yaghmour
  2005-08-03 14:46                                 ` Andi Kleen
  1 sibling, 2 replies; 83+ messages in thread
From: Satoshi Oshima @ 2005-08-01 19:57 UTC (permalink / raw)
  To: Richard J Moore, systemtap
  Cc: Andi Kleen, Mathieu Desnoyers, Masami Hiramatsu, Karim Yaghmour,
	Masami Hiramatsu, michel.dagenais, Roland McGrath, sugita

Hi,

I am a member of djprobe team.

Thank you very much for your information.

We didn't realize that ettarum when we designed djprobe.


And we believe that djprobe can safely modify the code like this;

step 1: making int3 bypass code using kprobe

step 2: safety check;
         make sure that all CPUs don't run on the code that will
         be replaced with jmp instruction (also check whether stack
         include EIP of the code which is subject to replace)

step 3: (after all CPU pass safety check) replace with jmp
         instruction without first byte. leave int 3 instruction
         unchanged at this time (new step).

step 4: i-cache flush or serializing:
         invoke i-cache flush instruction such as CLFLASH or serialize
         instruction such as CPUID on all CPUs (new step)

step 5: (after all CPU invoke i-cache flush or serializing instruction)
         replace int 3 instruction with first byte of jmp instruction

How do you think of this?


Richard J Moore wrote:
> 
> 
> 
> There is another issue to consider when looking into using probes other
> then int3:
> 
> Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
> practice of modifying code on one processor where another has prefetched
> the unmodified version of the code. Intel states that unpredictable general
> protection faults may result if a synchronizing instruction (iret, int,
> int3, cpuid, etc ) is not executed on the second processor before it
> executes the pre-fetched out-of-date copy of the instruction.
> 
> When we became aware of this I had a long discussion with Intel's
> microarchitecture guys. It turns out that the reason for this erratum
> (which incidentally Intel does not intend to fix) is because the trace
> cache - the stream of micorops resulting from instruction interpretation -
> cannot guaranteed to be valid. Reading between the lines I assume this
> issue arises because of optimization done in the trace cache, where it is
> no longer possible to identify the original instruction boundaries. If the
> CPU discoverers that the trace cache has been invalidated because of
> unsynchronized cross-modification then instruction execution will be
> aborted with a GPF. Further discussion with Intel revealed that replacing
> the first opcode byte with an int3 would not be subject to this erratum.
> 
> So, is cmpxchg reliable? One has to guarantee more than mere atomicity.
> 
> 
> 
> - -
> Richard J Moore
> IBM Advanced Linux Response Team - Linux Technology Centre
> MOBEX: 264807; Mobile (+44) (0)7739-875237
> Office: (+44) (0)1962-817072
> 
> 
>                                                                            
>              Andi Kleen                                                    
>              <ak@suse.de>                                                  
>                                                                         To 
>              31/07/2005              Mathieu Desnoyers                     
>              23:03                   <compudj@krystal.dyndns.org>          
>                                                                         cc 
>                                      Andi Kleen <ak@suse.de>, Karim        
>                                      Yaghmour <karim@opersys.com>, Masami  
>                                      Hiramatsu                             
>                                      <masami.hiramatsu@gmail.com>, Masami  
>                                      Hiramatsu                             
>                                      <hiramatu@sdl.hitachi.co.jp>, Roland  
>                                      McGrath <roland@redhat.com>, Richard  
>                                      J Moore/UK/IBM@IBMGB,                 
>                                      systemtap@sources.redhat.com,         
>                                      sugita@sdl.hitachi.co.jp, Satoshi     
>                                      Oshima <soshima@redhat.com>,          
>                                      michel.dagenais@polymtl.ca            
>                                                                        bcc 
>                                                                            
>                                                                    Subject 
>                                      Re: Hitachi djprobe mechanism         
>                                                                            
>                                                                            
> 
> 
> 
> 
> On Sat, Jul 30, 2005 at 12:47:47PM -0400, Mathieu Desnoyers wrote:
> 
>>* Andi Kleen (ak@suse.de) wrote:
>>
>>>>As I see it, the write in memory is atomic, but not the instruction
> 
> fetching. In
> 
>>>>that case, the reader would see an inconsistent last jmp address
> 
> byte.
> 
>>>Yes, you're right. cmpxchg only helps when the replaced instruction
>>>is >= the new instruction. For smaller instructions only a IPI to
>>>stop all CPUs works.
>>>
>>
>>It was not exactly the point of my comment. If we try to overwrite an
> 
> existing
> 
>>instruction, without any marker, two cases may show up :
>>
>>* the instruction to replace is >= the jmp instruction (5 bytes)
>>
>>It has been suggested that using cmpxchg8 would solve this problem.
> 
> cmpxchg8
> 
>>does indeed commit 8 bytes of data to memory atomically, even on 32 bits
>>architectures.
>>
>>My question is related to the instruction we want to replace : how is it
> 
> read by
> 
>>the CPU ? If it's 5 bytes in size, il has to be read in two chunks by the
> 
> cpu in
> 
>>a 32 bits arch. Does the CPU lock the memory bus between those two read ?
> 
> 
> 32bit ISA has nothing to do how the CPU fetches instructions
> ("32bit" x86s usually have a much wider memory interface)
> 
> In general these things are done on cache lines between 32 and 128 bytes
> depending on the CPU. Of course cache lines can be crossed by instructions,
> but the
> CPU should handle that atomically.
> 
> However is no guarantee afaik for that in the architecture though so you
> cannot
> really rely on it. If let's say the 386 had this behaviour then it is
> probably
> safe to assume later x86s implement it too for compatibility (modulo bugs)
> 
> In practice it's more complicated. The CPU fetches the instruction
> some time before actually executing it into its pipeline, and then sniffs
> the bus for any modifications of it and then cancels and reexecutes the
> instruction if needed.
> 
> However when you look at CPU errata sheets you will find quite a lot
> of bugs in this area, so I would not really rely on frequent patching for
> production.
> 
> I think just using the IPI is much simpler and easier.
> 
> 
> 
>>* the instruction to replace is < the jmp instruction (4 bytes or less)
>>
>>If our goal is to overwrite code which has not been surrounded by a
> 
> marker, an
> 
>>IPI wouldn't save us here. The marker is necessary in order to disable
>>interruptions and make the IPI meaningful.
> 
> 
> You lost me here.
> 
> 
> 
>>
>>>Actually there may be tricks possible to first int3 (or equivalent
> 
> single
> 
>>>byte replacement on other archs) the second instruction,
>>>then the first, then wait for a RCU period of all CPUs to quiescence
> 
> and then
> 
>>>write the longer jump. But an IPI is probably easier because it doesn't
> 
> need
> 
>>>a full disassembler for this and setting probes should not be
> 
> performance
> 
>>>critical.
>>>
>>
>>Well, in fact, there is still a problem. (on no, not again!) ;)  The RCU
> 
> does
> 
>>require the reader to disable preemption, otherwise there is no guarantee
> 
> that
> 
>>they won't be scheduled out in the middle of the critical section, and
> 
> the RCU
> 
>>does only guarantee that a non schedulable reader will have finished by
> 
> the time
> 
>>the RCU period is over.
>>
>>How do you plan to disable unvolountary preemption around the
> 
> instructions you
> 
>>want to overwrite ?
> 
> 
> 
> One way would be to just search the task list for any tasks blocked with an
> IP
> inside the patched region. If yes rewait for another quiescent period.
> 
> -Andi
> 
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 19:57                               ` Satoshi Oshima
@ 2005-08-01 20:21                                 ` Karim Yaghmour
  2005-08-01 22:12                                   ` Satoshi Oshima
                                                     ` (2 more replies)
  2005-08-03 14:46                                 ` Andi Kleen
  1 sibling, 3 replies; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-01 20:21 UTC (permalink / raw)
  To: Satoshi Oshima
  Cc: Richard J Moore, systemtap, Andi Kleen, Mathieu Desnoyers,
	Masami Hiramatsu, Masami Hiramatsu, michel.dagenais,
	Roland McGrath, sugita

Satoshi Oshima wrote:
> step 2: safety check;
>          make sure that all CPUs don't run on the code that will
>          be replaced with jmp instruction (also check whether stack
>          include EIP of the code which is subject to replace)

Please explain exactly how you will make sure that there is no pre-existing
reference to any of the replaced instructions, whether it be on the stack
or elsewhere. Consider a system that has many thousands of processes running
in parallel on different CPUs.

Also consider that you may find things on the stack that look like address
references to the range you wish to replace, but are actually valid data.

> step 3: (after all CPU pass safety check) replace with jmp
>          instruction without first byte. leave int 3 instruction
>          unchanged at this time (new step).

This still fails to cover the very simple case I explained earlier:
	if (...)
		goto label;
	<more code>
	single_byte_asm_instruction_code();
label:
	foo();

You still can't replace the instruction right before the label, and you'd
have to have an integrated disassembler to go through all the code and
make sure it too doesn't have a reference to the address of "label:".

In as far as I can see, it remains that the only safe way to use djprobe
is to not touch any instruction that is less than 5 bytes, that's if
there aren't other limitations as I mentioned earlier.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 20:21                                 ` Karim Yaghmour
@ 2005-08-01 22:12                                   ` Satoshi Oshima
  2005-08-01 22:54                                     ` Karim Yaghmour
  2005-08-02  9:42                                   ` Mathieu Lacage
  2005-08-02 15:33                                   ` Mathieu Lacage
  2 siblings, 1 reply; 83+ messages in thread
From: Satoshi Oshima @ 2005-08-01 22:12 UTC (permalink / raw)
  To: karim
  Cc: Richard J Moore, systemtap, Andi Kleen, Mathieu Desnoyers,
	Masami Hiramatsu, Masami Hiramatsu, michel.dagenais,
	Roland McGrath, sugita

Thank you for your comment.

Karim Yaghmour wrote:
> Satoshi Oshima wrote:
> 
>>step 2: safety check;
>>         make sure that all CPUs don't run on the code that will
>>         be replaced with jmp instruction (also check whether stack
>>         include EIP of the code which is subject to replace)
> 
> Please explain exactly how you will make sure that there is no pre-existing
> reference to any of the replaced instructions, whether it be on the stack
> or elsewhere. Consider a system that has many thousands of processes running
> in parallel on different CPUs.
 > Also consider that you may find things on the stack that look like 
address
 > references to the range you wish to replace, but are actually valid data.

As Masami answered in another thread, we need to divide the problem
depending on the condition below:

1) full preemptive kernel
2) voluntary or non preemptive kernel

When they select 1), djprobe cannot be applied currently.
So we decided that djprobe functionality will be off by
Kconfig (be replaced with kprobe).

But the case 2), we believe that we can expect currently
sleeping process' stack only include EIPs which are limited
address such as might_resched() or sched(). So djprobe user
must not insert a probe to such point. In my understanding,
voluntary or non preemption kernel doesn't try to preempt
during interruption context.

In addition, all CPU run on bypass code after int3 bypass
is created. (In another word, once int3 bypass would be set,
all CPU never push replacing instruction address on it's stack)

So we need to take care of EIPs on current process of all CPUs
and interrupt stack. Now we are implementing this check code,
and we will provide soon.

>>step 3: (after all CPU pass safety check) replace with jmp
>>         instruction without first byte. leave int 3 instruction
>>         unchanged at this time (new step).
> 
> This still fails to cover the very simple case I explained earlier:
> 	if (...)
> 		goto label;
> 	<more code>
> 	single_byte_asm_instruction_code();
> label:
> 	foo();
> 
> You still can't replace the instruction right before the label, and you'd
> have to have an integrated disassembler to go through all the code and
> make sure it too doesn't have a reference to the address of "label:".

I know that problem. Current djprobe's helper script show
disassemble code and prompt to avoid inserting a probe code
into such place.

We may need to develop check function to avoid this problem,
but it will be an userland tools. We expect that translator
would provide these safety check, if possible.

Though djprobe has a few limitation, we believe that it is
usefull for SystemTap project.

> 
> In as far as I can see, it remains that the only safe way to use djprobe
> is to not touch any instruction that is less than 5 bytes, that's if
> there aren't other limitations as I mentioned earlier.
> 
> Karim

Satoshi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 22:12                                   ` Satoshi Oshima
@ 2005-08-01 22:54                                     ` Karim Yaghmour
  2005-08-02 18:42                                       ` Satoshi Oshima
  0 siblings, 1 reply; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-01 22:54 UTC (permalink / raw)
  To: Satoshi Oshima
  Cc: Richard J Moore, systemtap, Andi Kleen, Mathieu Desnoyers,
	Masami Hiramatsu, Masami Hiramatsu, michel.dagenais,
	Roland McGrath, sugita

Satoshi Oshima wrote:
> As Masami answered in another thread, we need to divide the problem
> depending on the condition below:
> 
> 1) full preemptive kernel
> 2) voluntary or non preemptive kernel

Yes, I have seen this answer, and it is incomplete.

> But the case 2), we believe that we can expect currently
> sleeping process' stack only include EIPs which are limited
> address such as might_resched() or sched(). So djprobe user
> must not insert a probe to such point. In my understanding,
> voluntary or non preemption kernel doesn't try to preempt
> during interruption context.

But what about when the call that caused the resched came from higher
up the call tree and that it is that former call that is getting
squashed by the insertion of a jump on the instruction preceeding it.
The only way you could limit that is if you did a static analysis
and forbade any insertion of probes on any instruction preceeding
a call that _may_ result in a process scheduling ... Surely you see
this can't scale.

> In addition, all CPU run on bypass code after int3 bypass
> is created. (In another word, once int3 bypass would be set,
> all CPU never push replacing instruction address on it's stack)
> 
> So we need to take care of EIPs on current process of all CPUs
> and interrupt stack. Now we are implementing this check code,
> and we will provide soon.

But you have no way to figure out whether what you've found on the
stack is an address to some piece of code or just some valid data ...

> I know that problem. Current djprobe's helper script show
> disassemble code and prompt to avoid inserting a probe code
> into such place.
> 
> We may need to develop check function to avoid this problem,
> but it will be an userland tools. We expect that translator
> would provide these safety check, if possible.

So therefore, what this will do is, for each probe address candidate
for an instruction less than 5 bytes, it will go through all of the
kernel code to make sure that there are no references pointing to the
next instruction(s) ... ? This after having checked on process
stacks to make sure no one has a reference to those same addresses
while somehow figuring out whether what's being looked at is not
some data, but really a return address?

> Though djprobe has a few limitation, we believe that it is
> usefull for SystemTap project.

Like I said before, I can't stop anyone from working on anything,
so what I say here is really noise. Consider, though, that the
proposals being presented here seem to seriously increase in
complexity as more and more limitations of djprobe are explained,
and then weigh that in comparison to the real benefits in terms
of actual usage and general applicability.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 20:21                                 ` Karim Yaghmour
  2005-08-01 22:12                                   ` Satoshi Oshima
@ 2005-08-02  9:42                                   ` Mathieu Lacage
  2005-08-02 15:09                                     ` Karim Yaghmour
  2005-10-07 15:35                                     ` Richard J Moore
  2005-08-02 15:33                                   ` Mathieu Lacage
  2 siblings, 2 replies; 83+ messages in thread
From: Mathieu Lacage @ 2005-08-02  9:42 UTC (permalink / raw)
  To: karim; +Cc: systemtap

[-- Attachment #1: Type: text/plain, Size: 2596 bytes --]

[trimming the CC list and assuming all CCed persons are subscribed to
systemtap]

On Mon, 2005-08-01 at 16:31 -0400, Karim Yaghmour wrote:

> > step 3: (after all CPU pass safety check) replace with jmp
> >          instruction without first byte. leave int 3 instruction
> >          unchanged at this time (new step).
> 
> This still fails to cover the very simple case I explained earlier:
> 	if (...)
> 		goto label;
> 	<more code>
> 	single_byte_asm_instruction_code();
> label:
> 	foo();
> 
> You still can't replace the instruction right before the label, and you'd
> have to have an integrated disassembler to go through all the code and
> make sure it too doesn't have a reference to the address of "label:".

This problem probably should be addressed in userspace and the way this
should be solved is by calculating the location of the basic blocks of
the function in which you want to insert the probe. Then, any basic
block bigger than 5 bytes will be an acceptable candidate for probe
insertion.

Clearly, this is one of the reasons the kerninst people built a system-
wide daemon which did perform the basic-block calculation.

The attached ugly perl script evaluates the basic blocks and outputs
statistics about their size. Please, note the "evaluate" verb used
above. It means that I am pretty sure this script is not 100% reliable
but it should give non-skewed results given the size of most binaries.
Beware: this thing will suck away your CPU time.

objdump -d -j .text /usr/lib/libgtk.so |./analysis.pl --print-stats
[...]
percentage of basic blocks bigger than 5 bytes: 97.45
bytes percentage of basic blocks bigger than 5 bytes: 99.68

objdump -d -j .text /usr/lib/libgtk-x11-2.0.so |./analysis.pl --print-
stats
[...]
percentage of basic blocks bigger than 5 bytes: 92.87
bytes percentage of basic blocks bigger than 5 bytes: 99.09

objdump -d -j .text /usr/X11R6/bin/X |./analysis.pl --print-stats
[...]
percentage of basic blocks bigger than 5 bytes: 96.63
bytes percentage of basic blocks bigger than 5 bytes: 99.60

objdump -d -j .text /usr/X11R6/lib/libX11.so |./analysis.pl --print-
stats
[...]
percentage of basic blocks bigger than 5 bytes: 96.98
bytes percentage of basic blocks bigger than 5 bytes: 99.60

I must say that I am pretty surprised by this rather positive result
which means that if you perform a proper bb-analysis of your binaries,
you should be able to put a probe almost anywhere in your binary without
much complicated instruction relocation work (modulo the issues related
to inserting and removing the probe itself).

regards,
Mathieu
-- 

[-- Attachment #2: analysis.pl --]
[-- Type: application/x-perl, Size: 9395 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02  9:42                                   ` Mathieu Lacage
@ 2005-08-02 15:09                                     ` Karim Yaghmour
  2005-10-07 15:35                                     ` Richard J Moore
  1 sibling, 0 replies; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-02 15:09 UTC (permalink / raw)
  To: Mathieu Lacage; +Cc: systemtap


Mathieu Lacage wrote:
> This problem probably should be addressed in userspace and the way this
> should be solved is by calculating the location of the basic blocks of
> the function in which you want to insert the probe. Then, any basic
> block bigger than 5 bytes will be an acceptable candidate for probe
> insertion.
> 
> Clearly, this is one of the reasons the kerninst people built a system-
> wide daemon which did perform the basic-block calculation.
> 
> The attached ugly perl script evaluates the basic blocks and outputs
> statistics about their size. Please, note the "evaluate" verb used
> above. It means that I am pretty sure this script is not 100% reliable
> but it should give non-skewed results given the size of most binaries.
> Beware: this thing will suck away your CPU time.

This would certainly be a step in the right direction. I too am suprised
of the results. It would really be interesting to see how this compares
to the output from the compiler, as Michel suggests. ... there's still
that problem of finding out what's on a process' stack though ...

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 20:21                                 ` Karim Yaghmour
  2005-08-01 22:12                                   ` Satoshi Oshima
  2005-08-02  9:42                                   ` Mathieu Lacage
@ 2005-08-02 15:33                                   ` Mathieu Lacage
  2005-08-02 15:36                                     ` Mathieu Lacage
  2005-08-02 16:12                                     ` Karim Yaghmour
  2 siblings, 2 replies; 83+ messages in thread
From: Mathieu Lacage @ 2005-08-02 15:33 UTC (permalink / raw)
  To: karim; +Cc: systemtap

[trimming CC list again]

On Mon, 2005-08-01 at 16:31 -0400, Karim Yaghmour wrote:
> Satoshi Oshima wrote:
> > step 2: safety check;
> >          make sure that all CPUs don't run on the code that will
> >          be replaced with jmp instruction (also check whether stack
> >          include EIP of the code which is subject to replace)
> 
> Please explain exactly how you will make sure that there is no pre-existing
> reference to any of the replaced instructions, whether it be on the stack
> or elsewhere. Consider a system that has many thousands of processes running
> in parallel on different CPUs.
> 
> Also consider that you may find things on the stack that look like address
> references to the range you wish to replace, but are actually valid data.

I have probably missed something here but I would appreciate if you
could point me to my mistake.

The only reasonable reason why you would see the EIP of an instruction
somewhere on the stack is because it was pushed there as:
  - a function argument as a function callback.
  - the return address of a call statement.

In both cases, these EIPs represent the start of a basic block which
means that, if you follow my earlier suggestion of calculating carefully
the complete basic-block tree of your functions to avoid inserting
probes at basic-block boundaries, you should be able to always ensure
that these EIPs stay valid. At worst, if they are used by some piece of
code, the code will jump back to the old basic-block which will
immediately jump to its probe because its first instruction is the probe
jump.

regards,
Mathieu
-- 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02 15:33                                   ` Mathieu Lacage
@ 2005-08-02 15:36                                     ` Mathieu Lacage
  2005-08-02 16:12                                     ` Karim Yaghmour
  1 sibling, 0 replies; 83+ messages in thread
From: Mathieu Lacage @ 2005-08-02 15:36 UTC (permalink / raw)
  To: karim; +Cc: systemtap

On Tue, 2005-08-02 at 17:32 +0200, Mathieu Lacage wrote:
> [trimming CC list again]
> 
> On Mon, 2005-08-01 at 16:31 -0400, Karim Yaghmour wrote:
> > Satoshi Oshima wrote:
> > > step 2: safety check;
> > >          make sure that all CPUs don't run on the code that will
> > >          be replaced with jmp instruction (also check whether stack
> > >          include EIP of the code which is subject to replace)
> > 
> > Please explain exactly how you will make sure that there is no pre-existing
> > reference to any of the replaced instructions, whether it be on the stack
> > or elsewhere. Consider a system that has many thousands of processes running
> > in parallel on different CPUs.
> > 
> > Also consider that you may find things on the stack that look like address
> > references to the range you wish to replace, but are actually valid data.
> 
> I have probably missed something here but I would appreciate if you
> could point me to my mistake.
> 
> The only reasonable reason why you would see the EIP of an instruction
> somewhere on the stack is because it was pushed there as:
>   - a function argument as a function callback.
>   - the return address of a call statement.
> 
> In both cases, these EIPs represent the start of a basic block which
> means that, if you follow my earlier suggestion of calculating carefully
> the complete basic-block tree of your functions to avoid inserting
> probes at basic-block boundaries, you should be able to always ensure

I meant "across basic-block boundaries" and not "at basic-block
boundaries".

> that these EIPs stay valid. At worst, if they are used by some piece of
> code, the code will jump back to the old basic-block which will
> immediately jump to its probe because its first instruction is the probe
> jump.

Mathieu
-- 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02 15:33                                   ` Mathieu Lacage
  2005-08-02 15:36                                     ` Mathieu Lacage
@ 2005-08-02 16:12                                     ` Karim Yaghmour
  2005-08-02 16:30                                       ` Mathieu Lacage
  1 sibling, 1 reply; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-02 16:12 UTC (permalink / raw)
  To: Mathieu Lacage; +Cc: systemtap

Mathieu Lacage wrote:
> I have probably missed something here but I would appreciate if you
> could point me to my mistake.

I don't see a mistake, you just added the restriction that actually makes
this a saner discussion: using basic blocks instead of arbitrary jmp
insertions.

> The only reasonable reason why you would see the EIP of an instruction
> somewhere on the stack is because it was pushed there as:
>   - a function argument as a function callback.
>   - the return address of a call statement.
> 
> In both cases, these EIPs represent the start of a basic block which
> means that, if you follow my earlier suggestion of calculating carefully
> the complete basic-block tree of your functions to avoid inserting
> probes at basic-block boundaries, you should be able to always ensure
> that these EIPs stay valid. At worst, if they are used by some piece of
> code, the code will jump back to the old basic-block which will
> immediately jump to its probe because its first instruction is the probe
> jump.

In as far as the insertions do not cross basic-block boundaries, then
this would seem to hold, and it would seem to solve the main issues I
had with this approach. I haven't really looked at the implications of
this entirely, and there may be some corner cases still (like copy_to_
user() and the other CPU-related stuff) but it's certainly a refreshing
take on things.

Clearly then it would become necessary to have a basic block analyzer
that runs at least once prior to inserting any such probes. It would
certainly be interesting to see what the compiler reports in terms of
basic block sizes.

BTW, looking rapidely at your script, it doesn't seem to look for "call".
Maybe someone should try compiling the kernel with -fprofile-arcs and
-ftest-coverage to get the .bb and .bbg files, and run an analysis based
on those files.

Thanks for inserting some sanity to this thread.

Side note: None of this will hold with kernels that have preemption
enabled. And from what I can gather from talking about this with folks
at the OLS, preempt_rt seems on its way in. So in as long as we are
talking about non-preemptable kernels, what you suggest should hold.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02 16:12                                     ` Karim Yaghmour
@ 2005-08-02 16:30                                       ` Mathieu Lacage
  2005-08-02 16:46                                         ` Karim Yaghmour
  2005-08-04 17:09                                         ` Mathieu Lacage
  0 siblings, 2 replies; 83+ messages in thread
From: Mathieu Lacage @ 2005-08-02 16:30 UTC (permalink / raw)
  To: karim; +Cc: systemtap

On Tue, 2005-08-02 at 12:22 -0400, Karim Yaghmour wrote:

> In as far as the insertions do not cross basic-block boundaries, then
> this would seem to hold, and it would seem to solve the main issues I
> had with this approach. I haven't really looked at the implications of
> this entirely, and there may be some corner cases still (like copy_to_
> user() and the other CPU-related stuff) but it's certainly a refreshing
> take on things.
> 
> Clearly then it would become necessary to have a basic block analyzer
> that runs at least once prior to inserting any such probes. It would

Yes. I might give a try at writing a real one sometime soon. This is
just too interesting for me not to pursue it a bit further :)

> certainly be interesting to see what the compiler reports in terms of
> basic block sizes.
> 
> BTW, looking rapidely at your script, it doesn't seem to look for "call".

Yes, I assumed that all "call"s return someday (which is obviously wrong
for a few prominent functions in the kernel) but I did not really think
about the issue of EIPs located on the stack when I wrote that script.

I am also pretty sure that there are a few bugs in this code, most
notably one which you will get if you see the "old bb empty" message. I
have not yet figured out how this code path can be triggered (it
shouldn't).

If you take the constraint of call-pushed EIPs on the stack into
account, you need to change the script to split the bbs right after
calls. I will do this tonight and see if it changes the results notably.

> Maybe someone should try compiling the kernel with -fprofile-arcs and
> -ftest-coverage to get the .bb and .bbg files, and run an analysis based
> on those files.

Do you know of any documentation on the format of these files other than
the gcc source code ? I must confess that I did not know these options
actually generated a dump of the basic blocks but now that I think about
it, it is quite obvious...

Mathieu
-- 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02 16:30                                       ` Mathieu Lacage
@ 2005-08-02 16:46                                         ` Karim Yaghmour
  2005-08-04 17:09                                         ` Mathieu Lacage
  1 sibling, 0 replies; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-02 16:46 UTC (permalink / raw)
  To: Mathieu Lacage; +Cc: systemtap

Mathieu Lacage wrote:
> Yes. I might give a try at writing a real one sometime soon. This is
> just too interesting for me not to pursue it a bit further :)

More power to you :)

> Do you know of any documentation on the format of these files other than
> the gcc source code ? I must confess that I did not know these options
> actually generated a dump of the basic blocks but now that I think about
> it, it is quite obvious...

Try the gcov source code. gcov is actually what ends up reading those
files typically to provide coverage analysis, so I'm pretty sure it's
got all the logic in there to decode the format. Come to think of it,
it may actually make sense to patch gcov to do what you're looking for
instead of rewriting something from scratch.

You could also try to google around, maybe someone somewhere wrote
something about the .bb and .bbg files. Also, maybe someone on the
list actually has some relevant pointers.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 22:54                                     ` Karim Yaghmour
@ 2005-08-02 18:42                                       ` Satoshi Oshima
  2005-08-03 14:50                                         ` Karim Yaghmour
  2005-08-04  1:19                                         ` Mathieu Desnoyers
  0 siblings, 2 replies; 83+ messages in thread
From: Satoshi Oshima @ 2005-08-02 18:42 UTC (permalink / raw)
  To: karim
  Cc: Richard J Moore, systemtap, Andi Kleen, Mathieu Desnoyers,
	Masami Hiramatsu, Masami Hiramatsu, michel.dagenais,
	Roland McGrath, sugita

Karim Yaghmour wrote:
>>But the case 2), we believe that we can expect currently
>>sleeping process' stack only include EIPs which are limited
>>address such as might_resched() or sched(). So djprobe user
>>must not insert a probe to such point. In my understanding,
>>voluntary or non preemption kernel doesn't try to preempt
>>during interruption context.
> 
> But what about when the call that caused the resched came from higher
> up the call tree and that it is that former call that is getting
> squashed by the insertion of a jump on the instruction preceeding it.

I see.

We should add another limitation to djprobe limitation list.
Current list is ...
------------------------------------------

limitation of djprobe

djprobe user must avoid inserting a probe into the place below:

code includes relative jmp instruction
code includes call instruction
code includes int instruction
functions that preempt current process such as sched() or might_resched()

------------------------------------------

Anything else?

> The only way you could limit that is if you did a static analysis
> and forbade any insertion of probes on any instruction preceeding
> a call that _may_ result in a process scheduling ... Surely you see
> this can't scale.

I don't see why that analysis is required.
We can simply suggest that user should avoid a call
instruction.

The problem is EIPs which is included with replacing
code on stack. So there is no problem when they don't
try to replace call instruction.

>>In addition, all CPU run on bypass code after int3 bypass
>>is created. (In another word, once int3 bypass would be set,
>>all CPU never push replacing instruction address on it's stack)
>>
>>So we need to take care of EIPs on current process of all CPUs
>>and interrupt stack. Now we are implementing this check code,
>>and we will provide soon.
> 
> But you have no way to figure out whether what you've found on the
> stack is an address to some piece of code or just some valid data ...

We are implementing two different way to check this.

First one:
Each interrupt handler push EIP on the stack to djprobe's
per cpu data structure before calling do_irq or something,
and pop EIP after returning.For checking safety,
djprobe look through this pushed EIPs.

djprobe can easily check EIPs which are included on stacks.

But we are afraid that upstream would not accept this
approach. So we are now trying another one.

Second one:
Simply looking through current stack and interruption stack.
djprobe may find the data that is same to an address to replace.
When it would happen, djprobe can easily postpone to replace
and wait for next check.

This implementation brings some delay to replace int 3 with
jmp. But probe code is still valid by kprobe and there is
no other side effect. Probe cost is same as kprobe.

>>I know that problem. Current djprobe's helper script show
>>disassemble code and prompt to avoid inserting a probe code
>>into such place.
>>
>>We may need to develop check function to avoid this problem,
>>but it will be an userland tools. We expect that translator
>>would provide these safety check, if possible.
> 
> So therefore, what this will do is, for each probe address candidate
> for an instruction less than 5 bytes, it will go through all of the
> kernel code to make sure that there are no references pointing to the
> next instruction(s) ... ? This after having checked on process
> stacks to make sure no one has a reference to those same addresses
> while somehow figuring out whether what's being looked at is not
> some data, but really a return address?

Currently we have no plan to limit djprobe not to use for
less than 5 bytes instruction. But when we would move to it,
djprobe will not provide any check on stack. There is no
problem when a stack has the same addresses to replace
if the candidate is more than 4 byte. Because a processor
can run jmp instruction instead of replaced code or int 3
instruction.

Satoshi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 19:57                               ` Satoshi Oshima
  2005-08-01 20:21                                 ` Karim Yaghmour
@ 2005-08-03 14:46                                 ` Andi Kleen
  1 sibling, 0 replies; 83+ messages in thread
From: Andi Kleen @ 2005-08-03 14:46 UTC (permalink / raw)
  To: Satoshi Oshima
  Cc: Richard J Moore, systemtap, Andi Kleen, Mathieu Desnoyers,
	Masami Hiramatsu, Karim Yaghmour, Masami Hiramatsu,
	michel.dagenais, Roland McGrath, sugita

> step 2: safety check;
>         make sure that all CPUs don't run on the code that will
>         be replaced with jmp instruction (also check whether stack
>         include EIP of the code which is subject to replace)

I don't think there is a race free way to do this without an IPI to 
all CPUs. And if you do that you can as well do it simpler.

> step 4: i-cache flush or serializing:
>         invoke i-cache flush instruction such as CLFLASH or serialize
>         instruction such as CPUID on all CPUs (new step)

I'm not sure these flushes are needed. With IPIs certainly not.

-Andi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02 18:42                                       ` Satoshi Oshima
@ 2005-08-03 14:50                                         ` Karim Yaghmour
  2005-08-04  1:19                                         ` Mathieu Desnoyers
  1 sibling, 0 replies; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-03 14:50 UTC (permalink / raw)
  To: Satoshi Oshima
  Cc: Richard J Moore, systemtap, Andi Kleen, Mathieu Desnoyers,
	Masami Hiramatsu, Masami Hiramatsu, michel.dagenais,
	Roland McGrath, sugita

Just some short comments.

Satoshi Oshima wrote:
> I see.
> 
> We should add another limitation to djprobe limitation list.
> Current list is ...
> ------------------------------------------
> 
> limitation of djprobe
> 
> djprobe user must avoid inserting a probe into the place below:

I guess you mean "instructions shorter than 5 bytes that are right
before ..."

> code includes relative jmp instruction
> code includes call instruction
> code includes int instruction
> functions that preempt current process such as sched() or might_resched()

Actually for this last one, you will likely find that a lot of
functions actually do result in scheduling possibilities. So
this last requirement is rather difficult to follow.

Generally, I think I've already explained well enough the reasons
why some assumptions currently being made warrant some serious
examination. Your response is certainly a step in the right direction,
and I encourage you to continue looking at these issues closely.
If nothing else, you've got people like Mathieu who genuinely
understand the implications and should help guide you further.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02 18:42                                       ` Satoshi Oshima
  2005-08-03 14:50                                         ` Karim Yaghmour
@ 2005-08-04  1:19                                         ` Mathieu Desnoyers
  2005-08-04  3:31                                           ` Mathieu Desnoyers
  1 sibling, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-08-04  1:19 UTC (permalink / raw)
  To: Satoshi Oshima
  Cc: karim, Richard J Moore, systemtap, Andi Kleen, Masami Hiramatsu,
	Masami Hiramatsu, michel.dagenais, Roland McGrath, sugita

* Satoshi Oshima (soshima@redhat.com) wrote:
> I see.
> 
> We should add another limitation to djprobe limitation list.
> Current list is ...
> ------------------------------------------
> 
> limitation of djprobe
> 
> djprobe user must avoid inserting a probe into the place below:
> 
> code includes relative jmp instruction
> code includes call instruction
> code includes int instruction

Well... When you say "code includes int instruction", this is really not what I
mean by "code being interrupted".

Interruption are asynchronous to the executing code, they may happen anywhere
where interrupts are not disabled. You still can have a int instruction which
synchronously raises an interrupt, and yes, it's not safe to overwrite them. But
the prior problem is asynchronous interruptions.



> functions that preempt current process such as sched() or might_resched()
> 

Well, if you run a voulountarily preemptble kernel, those will be explicit
calls. On the other hand, running a full preemptible kernel will make scheduler
being called from anywhere in your code (using an asynchronous interrupt).
Everywhere where interrupts are not disabled or preemption is not disabled are
at risk.

> >The only way you could limit that is if you did a static analysis
> >and forbade any insertion of probes on any instruction preceeding
> >a call that _may_ result in a process scheduling ... Surely you see
> >this can't scale.
> 
> I don't see why that analysis is required.
> We can simply suggest that user should avoid a call
> instruction.
> 
> The problem is EIPs which is included with replacing
> code on stack. So there is no problem when they don't
> try to replace call instruction.
>

Asynchronous interrupts will return to any instruction which is not in a zone
where interrupts are disabled. No need of call instruction to have this problem.
Well, in fact, even worst : non maskable interrupts can return _anywhere_,
excepted in the fault handler code (a double fault is handled by a abort if I
remember well).

> 
> >>In addition, all CPU run on bypass code after int3 bypass
> >>is created. (In another word, once int3 bypass would be set,
> >>all CPU never push replacing instruction address on it's stack)
> >>
> >>So we need to take care of EIPs on current process of all CPUs
> >>and interrupt stack. Now we are implementing this check code,
> >>and we will provide soon.
> >
> >But you have no way to figure out whether what you've found on the
> >stack is an address to some piece of code or just some valid data ...
> 
> We are implementing two different way to check this.
> 
> First one:
> Each interrupt handler push EIP on the stack to djprobe's
> per cpu data structure before calling do_irq or something,
> and pop EIP after returning.For checking safety,
> djprobe look through this pushed EIPs.
> 
> djprobe can easily check EIPs which are included on stacks.
> 
> But we are afraid that upstream would not accept this
> approach. So we are now trying another one.
>

Well, it will clearly have a performance cost on live systems I am not sure many
people will like.

> 
> Second one:
> Simply looking through current stack and interruption stack.
> djprobe may find the data that is same to an address to replace.
> When it would happen, djprobe can easily postpone to replace
> and wait for next check.
> 
> This implementation brings some delay to replace int 3 with
> jmp. But probe code is still valid by kprobe and there is
> no other side effect. Probe cost is same as kprobe.
> 

How do you plan to check all processors'stack ?

> Currently we have no plan to limit djprobe not to use for
> less than 5 bytes instruction. But when we would move to it,
> djprobe will not provide any check on stack. There is no
> problem when a stack has the same addresses to replace
> if the candidate is more than 4 byte. Because a processor
> can run jmp instruction instead of replaced code or int 3
> instruction.
> 

Instruction cache coherency might be a problem there, even if the instruction to
replace is bigger than 5 bytes. You have to make sure the instruction cache of
each CPU is flushed before they go back to this modified section.

Mathieu



OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-04  1:19                                         ` Mathieu Desnoyers
@ 2005-08-04  3:31                                           ` Mathieu Desnoyers
  0 siblings, 0 replies; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-08-04  3:31 UTC (permalink / raw)
  To: Satoshi Oshima
  Cc: karim, Richard J Moore, systemtap, Andi Kleen, Masami Hiramatsu,
	Masami Hiramatsu, michel.dagenais, Roland McGrath, sugita

Points about interrupts are answered by your later posts. Thanks.


* Mathieu Desnoyers (compudj@krystal.dyndns.org) wrote:
> * Satoshi Oshima (soshima@redhat.com) wrote:
> > I see.
> > 
> > We should add another limitation to djprobe limitation list.
> > Current list is ...
> > ------------------------------------------
> > 
> > limitation of djprobe
> > 
> > djprobe user must avoid inserting a probe into the place below:
> > 
> > code includes relative jmp instruction
> > code includes call instruction
> > code includes int instruction
> 
> Well... When you say "code includes int instruction", this is really not what I
> mean by "code being interrupted".
> 
> Interruption are asynchronous to the executing code, they may happen anywhere
> where interrupts are not disabled. You still can have a int instruction which
> synchronously raises an interrupt, and yes, it's not safe to overwrite them. But
> the prior problem is asynchronous interruptions.
> 
> 
> 
> > functions that preempt current process such as sched() or might_resched()
> > 
> 
> Well, if you run a voulountarily preemptble kernel, those will be explicit
> calls. On the other hand, running a full preemptible kernel will make scheduler
> being called from anywhere in your code (using an asynchronous interrupt).
> Everywhere where interrupts are not disabled or preemption is not disabled are
> at risk.
> 
> > >The only way you could limit that is if you did a static analysis
> > >and forbade any insertion of probes on any instruction preceeding
> > >a call that _may_ result in a process scheduling ... Surely you see
> > >this can't scale.
> > 
> > I don't see why that analysis is required.
> > We can simply suggest that user should avoid a call
> > instruction.
> > 
> > The problem is EIPs which is included with replacing
> > code on stack. So there is no problem when they don't
> > try to replace call instruction.
> >
> 
> Asynchronous interrupts will return to any instruction which is not in a zone
> where interrupts are disabled. No need of call instruction to have this problem.
> Well, in fact, even worst : non maskable interrupts can return _anywhere_,
> excepted in the fault handler code (a double fault is handled by a abort if I
> remember well).
> 
> > 
> > >>In addition, all CPU run on bypass code after int3 bypass
> > >>is created. (In another word, once int3 bypass would be set,
> > >>all CPU never push replacing instruction address on it's stack)
> > >>
> > >>So we need to take care of EIPs on current process of all CPUs
> > >>and interrupt stack. Now we are implementing this check code,
> > >>and we will provide soon.
> > >
> > >But you have no way to figure out whether what you've found on the
> > >stack is an address to some piece of code or just some valid data ...
> > 
> > We are implementing two different way to check this.
> > 
> > First one:
> > Each interrupt handler push EIP on the stack to djprobe's
> > per cpu data structure before calling do_irq or something,
> > and pop EIP after returning.For checking safety,
> > djprobe look through this pushed EIPs.
> > 
> > djprobe can easily check EIPs which are included on stacks.
> > 
> > But we are afraid that upstream would not accept this
> > approach. So we are now trying another one.
> >
> 
> Well, it will clearly have a performance cost on live systems I am not sure many
> people will like.
> 
> > 
> > Second one:
> > Simply looking through current stack and interruption stack.
> > djprobe may find the data that is same to an address to replace.
> > When it would happen, djprobe can easily postpone to replace
> > and wait for next check.
> > 
> > This implementation brings some delay to replace int 3 with
> > jmp. But probe code is still valid by kprobe and there is
> > no other side effect. Probe cost is same as kprobe.
> > 
> 
> How do you plan to check all processors'stack ?
> 
> > Currently we have no plan to limit djprobe not to use for
> > less than 5 bytes instruction. But when we would move to it,
> > djprobe will not provide any check on stack. There is no
> > problem when a stack has the same addresses to replace
> > if the candidate is more than 4 byte. Because a processor
> > can run jmp instruction instead of replaced code or int 3
> > instruction.
> > 
> 
> Instruction cache coherency might be a problem there, even if the instruction to
> replace is bigger than 5 bytes. You have to make sure the instruction cache of
> each CPU is flushed before they go back to this modified section.
> 
> Mathieu
> 
> 
> 
> OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
> Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 
> 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02 16:30                                       ` Mathieu Lacage
  2005-08-02 16:46                                         ` Karim Yaghmour
@ 2005-08-04 17:09                                         ` Mathieu Lacage
  1 sibling, 0 replies; 83+ messages in thread
From: Mathieu Lacage @ 2005-08-04 17:09 UTC (permalink / raw)
  To: karim; +Cc: systemtap

On Tue, 2005-08-02 at 18:29 +0200, Mathieu Lacage wrote:

> If you take the constraint of call-pushed EIPs on the stack into
> account, you need to change the script to split the bbs right after
> calls. I will do this tonight and see if it changes the results notably.

/usr/X11R6/bin/X
-- split bbs on call
percentage of basic blocks bigger than 5 bytes: 94.96
bytes percentage of basic blocks bigger than 5 bytes: 99.16
-- do not split bbs on call
percentage of basic blocks bigger than 5 bytes: 96.63
bytes percentage of basic blocks bigger than 5 bytes: 99.60

/usr/X11R6/lib/libX11.so
-- split bbs on call
percentage of basic blocks bigger than 5 bytes: 94.58
bytes percentage of basic blocks bigger than 5 bytes: 99.04
-- do not split bbs on call
percentage of basic blocks bigger than 5 bytes: 96.98
bytes percentage of basic blocks bigger than 5 bytes: 99.60

The modified script is available at
http://www-sop.inria.fr/dream/personnel/Mathieu.Lacage/analysis2.pl

regards,
Mathieu

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02  9:42                                   ` Mathieu Lacage
  2005-08-02 15:09                                     ` Karim Yaghmour
@ 2005-10-07 15:35                                     ` Richard J Moore
  2005-10-08 18:33                                       ` mathieu lacage
  1 sibling, 1 reply; 83+ messages in thread
From: Richard J Moore @ 2005-10-07 15:35 UTC (permalink / raw)
  To: Mathieu Lacage; +Cc: systemtap

I've  been back through the discussion on placement of a djprobe jmp on a
instruction less than the jmp size.  I don't see any resolution to this. As
far as I can see there is no safe way to overlay an instruction less than
the size of a jmp with a jmp. So for X86, djprobes would have to be
excluded from probepoints on instructions less than 5 bytes.

I don't see why block analysis is helpful. Unless one can guarantee fixing
up all jmp to an instruction following the probed instruction then we
simply cannot allow jmp to overlay anything smaller than its length.

So are we agreed that djprobe only operates under x86 on instructions >= 5
bytes?

- -
Richard J Moore
IBM Linux Technology Centre

             Mathieu Lacage                                                
             <Mathieu.Lacage                                               
             @sophia.inria.f                                            To 
             r>                        karim@opersys.com                   
             Sent by:                                                   cc 
             systemtap-owner           systemtap@sources.redhat.com        
             @sources.redhat                                           bcc 
             .com                                                          
                                                                   Subject 
                                       Re: Hitachi djprobe mechanism       
             02/08/2005                                                    
             10:40                                                         

[trimming the CC list and assuming all CCed persons are subscribed to
systemtap]

On Mon, 2005-08-01 at 16:31 -0400, Karim Yaghmour wrote:

> > step 3: (after all CPU pass safety check) replace with jmp
> >          instruction without first byte. leave int 3 instruction
> >          unchanged at this time (new step).
>
> This still fails to cover the very simple case I explained earlier:
>     if (...)
>           goto label;
>     <more code>
>     single_byte_asm_instruction_code();
> label:
>     foo();
>
> You still can't replace the instruction right before the label, and you'd
> have to have an integrated disassembler to go through all the code and
> make sure it too doesn't have a reference to the address of "label:".

This problem probably should be addressed in userspace and the way this
should be solved is by calculating the location of the basic blocks of
the function in which you want to insert the probe. Then, any basic
block bigger than 5 bytes will be an acceptable candidate for probe
insertion.

Clearly, this is one of the reasons the kerninst people built a system-
wide daemon which did perform the basic-block calculation.

The attached ugly perl script evaluates the basic blocks and outputs
statistics about their size. Please, note the "evaluate" verb used
above. It means that I am pretty sure this script is not 100% reliable
but it should give non-skewed results given the size of most binaries.
Beware: this thing will suck away your CPU time.

objdump -d -j .text /usr/lib/libgtk.so |./analysis.pl --print-stats
[...]
percentage of basic blocks bigger than 5 bytes: 97.45
bytes percentage of basic blocks bigger than 5 bytes: 99.68

objdump -d -j .text /usr/lib/libgtk-x11-2.0.so |./analysis.pl --print-
stats
[...]
percentage of basic blocks bigger than 5 bytes: 92.87
bytes percentage of basic blocks bigger than 5 bytes: 99.09

objdump -d -j .text /usr/X11R6/bin/X |./analysis.pl --print-stats
[...]
percentage of basic blocks bigger than 5 bytes: 96.63
bytes percentage of basic blocks bigger than 5 bytes: 99.60

objdump -d -j .text /usr/X11R6/lib/libX11.so |./analysis.pl --print-
stats
[...]
percentage of basic blocks bigger than 5 bytes: 96.98
bytes percentage of basic blocks bigger than 5 bytes: 99.60

I must say that I am pretty surprised by this rather positive result
which means that if you perform a proper bb-analysis of your binaries,
you should be able to put a probe almost anywhere in your binary without
much complicated instruction relocation work (modulo the issues related
to inserting and removing the probe itself).

regards,
Mathieu
--

#### analysis.pl has been deleted (was saved in repository MyAttachments
Repository ->(Document link: Link to the attachment in the repository))
from this note on 03 August 2005 by Richard J Moore

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-07 15:35                                     ` Richard J Moore
@ 2005-10-08 18:33                                       ` mathieu lacage
  2005-10-08 21:59                                         ` Richard J Moore
  0 siblings, 1 reply; 83+ messages in thread
From: mathieu lacage @ 2005-10-08 18:33 UTC (permalink / raw)
  To: Richard J Moore; +Cc: systemtap

hi richard,

 >I don't see why block analysis is helpful. Unless one can guarantee fixing
 >up all jmp to an instruction following the probed instruction then we
 >simply cannot allow jmp to overlay anything smaller than its length.

Block analysis should allow you to detect all jmp targets which become 
the boundaries of the blocks. Thus, inserting any instruction in any 
block is harmless provided you do not cross the block boundaries because 
a jump target cannot fall _within_ the block. Does this answer your 
implicit question about the usefulness of the block analysis ?

Of course, the question becomes: how do you detect all jmp targets when 
some of them are indirect jumps. I did spend quite a bit of time trying 
to answer this question. So far, it is clear to me that non PIC code 
contains very few indirect jumps so you should be able to get close to 
90% function coverage with a simple block analysis taking into account 
only direct absolute and relative jumps. However, the hard part comes 
when you want to deal with functions which contain a switch statement. 
Being able to parse this last 10% of functions represents a lot of more 
work than what can be achieved with simple analysis.

The simple perl analysis code can be useful if you want to convince 
yourself about the figures above. I also wrote some prototyping code to 
familiarize myself with the x86 ISA: I think the code should be able to 
correctly parse and report direct jumps as well as their targets. I have 
stopped efforts in this direction since I started looking into the 
harder question of indirect jumps. I believe that an answer to the 
indirect jump question requires a real analysis of the code which means 
being able to perform constant propagation as well as dead code 
elimination passes on an intermediate representation of the code to be 
able to infer the location of the indirect jump tables as well as their 
size statically. I have a start of a framework to do this sort of stuff 
but nothing of practical interest to anyone.

Note that the above specifically ignores the issue of indirect _calls_ 
because I assume they are not able to call in the middle of another 
function.

For now, you can find my mostly finished C prototype in there: 
http://cutebugs.net/code/bozo-profiler/?cmd=manifest;manifest=fff970e1713fe16f8f637cd2065d2287f1d162d6;path=/libdebug/
Look for the files x86-opcode.c/h. I also started writing 
x86-opcode-print.c/h to debug the previous code but I stopped halfway.

 >So are we agreed that djprobe only operates under x86 on instructions >= 5
 >bytes?

I think this is the safe assumption you could fall back to if you found 
yourself being unable to parse the basic blocks of a function because of 
an indirect jump. Should you think that such a block analysis is useful, 
I can cleanup my parsing code and make it useful enough to detect the 
case where it fails and report all block boundaries otherwise.

regards,
Mathieu

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-08 18:33                                       ` mathieu lacage
@ 2005-10-08 21:59                                         ` Richard J Moore
  2005-10-08 23:24                                           ` Roland McGrath
  2005-10-09 16:47                                           ` mathieu lacage
  0 siblings, 2 replies; 83+ messages in thread
From: Richard J Moore @ 2005-10-08 21:59 UTC (permalink / raw)
  To: mathieu lacage; +Cc: systemtap






mathieu lacage <Mathieu.Lacage@sophia.inria.fr> wrote on 08/10/2005
19:33:13:

> hi richard,
>
>  >I don't see why block analysis is helpful. Unless one can guarantee
fixing
>  >up all jmp to an instruction following the probed instruction then we
>  >simply cannot allow jmp to overlay anything smaller than its length.
>
> Block analysis should allow you to detect all jmp targets which become
> the boundaries of the blocks. Thus, inserting any instruction in any
> block is harmless provided you do not cross the block boundaries because
> a jump target cannot fall _within_ the block. Does this answer your
> implicit question about the usefulness of the block analysis ?
>

So the assumption here is that:
1) we are dealing with non-optimized code.
2) we are dealing with gcc generated code.

Whilst it's unlikely that compilers other than gcc are used, it's not
impossible - e.g. Intel's IA64 compiler. And the likelihood of non-gcc
compilers increases when we consider user-space probes. But also it is
possible that we might want to probe handcrafted assembler code in
kernel-space.

Are we able to guard against these exceptions automatically, or do we have
to disallow a jmp probe on instructions less than the jmp size?
You realise that if we get the probing mechanism wrong we will cause
bizarrely unpredictable results.


> Of course, the question becomes: how do you detect all jmp targets when
> some of them are indirect jumps. I did spend quite a bit of time trying
> to answer this question. So far, it is clear to me that non PIC code
> contains very few indirect jumps so you should be able to get close to
> 90% function coverage with a simple block analysis taking into account
> only direct absolute and relative jumps. However, the hard part comes
> when you want to deal with functions which contain a switch statement.
> Being able to parse this last 10% of functions represents a lot of more
> work than what can be achieved with simple analysis.
>
> The simple perl analysis code can be useful if you want to convince
> yourself about the figures above. I also wrote some prototyping code to
> familiarize myself with the x86 ISA: I think the code should be able to
> correctly parse and report direct jumps as well as their targets. I have
> stopped efforts in this direction since I started looking into the
> harder question of indirect jumps. I believe that an answer to the
> indirect jump question requires a real analysis of the code which means
> being able to perform constant propagation as well as dead code
> elimination passes on an intermediate representation of the code to be
> able to infer the location of the indirect jump tables as well as their
> size statically. I have a start of a framework to do this sort of stuff
> but nothing of practical interest to anyone.
>
> Note that the above specifically ignores the issue of indirect _calls_
> because I assume they are not able to call in the middle of another
> function.
>

Not sure about that. I think I can find an example of c-code for which it
is impossible to determine the function boundaries from the assembler code,
but looks perfectly reasonable from the C perspective.



> For now, you can find my mostly finished C prototype in there:
> http://cutebugs.net/code/bozo-profiler/?cmd=manifest;
> manifest=fff970e1713fe16f8f637cd2065d2287f1d162d6;path=/libdebug/
> Look for the files x86-opcode.c/h. I also started writing
> x86-opcode-print.c/h to debug the previous code but I stopped halfway.
>
>  >So are we agreed that djprobe only operates under x86 on instructions
>= 5
>  >bytes?
>
> I think this is the safe assumption you could fall back to if you found
> yourself being unable to parse the basic blocks of a function because of
> an indirect jump. Should you think that such a block analysis is useful,
> I can cleanup my parsing code and make it useful enough to detect the
> case where it fails and report all block boundaries otherwise.
>
> regards,
> Mathieu
>
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-08 21:59                                         ` Richard J Moore
@ 2005-10-08 23:24                                           ` Roland McGrath
  2005-10-22 11:49                                             ` mathieu lacage
                                                               ` (2 more replies)
  2005-10-09 16:47                                           ` mathieu lacage
  1 sibling, 3 replies; 83+ messages in thread
From: Roland McGrath @ 2005-10-08 23:24 UTC (permalink / raw)
  To: Richard J Moore; +Cc: systemtap

Richard Henderson has done some work in both gcc and the assembler to emit
DWARF basic block markers.  The compiler knows what the jump targets are in
code it generates.  For inline assembly, the assembler supports emitting a
marker for every assembler label, and the compiler emits the assembler
directive to enable the assembler's automatic markers around the inline
assembly code it copies out of your `asm' statements.  When the tools with
this support are available (not yet), this will cover everything we see in
the kernel, and most user applications built in normal ways, when they are
built with DWARF info.  

The basic block markers are in the standard DWARF format specification.
(They are an optional part of the same encoding that provides source line
number information.)  Other compilers that, like gcc, have not heretofore
implemented this, can do so just as well as we can.  

Of course hand-crafted or specially-generated code could always use
computed jumps to locations not marked with assembler labels.  It just
seems very unlikely.  In the case of the kernel, we can be sure enough what
the set of code is and know that there isn't any funny business beyond the
ken of our automatic tools.  For the most general worst case, probably one
does have to be so conservative that you can only use jmp insertion on
unusually large instructions.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-08 21:59                                         ` Richard J Moore
  2005-10-08 23:24                                           ` Roland McGrath
@ 2005-10-09 16:47                                           ` mathieu lacage
  1 sibling, 0 replies; 83+ messages in thread
From: mathieu lacage @ 2005-10-09 16:47 UTC (permalink / raw)
  To: Richard J Moore; +Cc: systemtap

hi,

>So the assumption here is that:
>1) we are dealing with non-optimized code.
>  
>
Sorry ? Where did I say that ?

>2) we are dealing with gcc generated code.
>  
>
I did not say that either.

Here is the algorithm proposed:
1) have a function which can tell you the length of an instruction based 
on a pointer to the start of the instruction. This is pretty horrible to 
get right on x86 but it is quite possible and my sample code shows this.
2) have a function which can tell you if an instruction is one of:
  - a direct or indirect call
  - a ret
  - a direct relative or absolute jump
  - an indirect relative or absolute jump
3) input of the algorithm is the start and end address of a function. 
For each instruction located between start and end, execute 4, 5, 6, and 7
4) for each direct or indirect call, mark the following instruction as a 
block boundary
5) for each ret, mark the following instruction as a block boundary
6) for each direct relative or absolute jump, mark the following 
instruction as a block boundary
7) for each indirect relative of absolute jump, mark the function as 
non-parseable.
8) once you have executed 3 and if you have not stumbled upon 7), you 
have a list of all the instructions which are basic block boundaries 
which means you have solved the problem. end of story.

If you have hit 7), you can only place probes on 5 bytes big 
instructions. Otherwise, you can place probes anywhere in blocks bigger 
than 5 bytes.

None of the items presented above rely on code being generated by gcc or 
specific optimization levels being used.

>Whilst it's unlikely that compilers other than gcc are used, it's not
>impossible - e.g. Intel's IA64 compiler. And the likelihood of non-gcc
>compilers increases when we consider user-space probes. But also it is
>  
>
Placing probes in userspace will simply increse the probability that you 
have to fallback to the 5bytes per inst mechanism because a lot of 
userspace code is built with -fPIC which increases the probability of 
finding indirect jumps.

Should you be interested in these probabilities, I can come up quite 
easily with definite numbers on a number of linux-standard applications. 
Which applications are you interested in ?

>Are we able to guard against these exceptions automatically, or do we have
>  
>
The detection of the "bad case" (i.e., indirect jumps) is automatic and 
inherent to the algorithm proposed which means that the fallback to 
5bytes instructions is automatic.

[snip]

>Not sure about that. I think I can find an example of c-code for which it
>is impossible to determine the function boundaries from the assembler code,
>but looks perfectly reasonable from the C perspective.
>  
>
Oh, well, of course, you can do that. Detecting function boundaries is 
really hard. However, one of the major assumptions here is that you have 
access to the debugging information which gives you these function 
boundaries. If this assumption is not valid, then I don't think the idea 
of parsing basic block boundaries is reasonable for the application you 
are interested in.

Mathieu

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-08 23:24                                           ` Roland McGrath
@ 2005-10-22 11:49                                             ` mathieu lacage
  2005-10-22 22:09                                               ` Roland McGrath
       [not found]                                             ` <43621B0D.70204@sophia.inria.fr>
  2005-11-08  9:49                                             ` Richard J Moore
  2 siblings, 1 reply; 83+ messages in thread
From: mathieu lacage @ 2005-10-22 11:49 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Richard J Moore, systemtap

Roland McGrath wrote:

>Richard Henderson has done some work in both gcc and the assembler to emit
>DWARF basic block markers.  The compiler knows what the jump targets are in
>  
>
hi roland,

Is there a special version available of these tools with this work 
included ?

Mathieu

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-22 11:49                                             ` mathieu lacage
@ 2005-10-22 22:09                                               ` Roland McGrath
  2005-10-24  6:33                                                 ` Mathieu Lacage
  0 siblings, 1 reply; 83+ messages in thread
From: Roland McGrath @ 2005-10-22 22:09 UTC (permalink / raw)
  To: mathieu lacage; +Cc: systemtap

> Roland McGrath wrote:
> 
> >Richard Henderson has done some work in both gcc and the assembler to emit
> >DWARF basic block markers.  The compiler knows what the jump targets are in
> >  
> >
> hi roland,
> 
> Is there a special version available of these tools with this work 
> included ?

The work is still in progress.  The assembler work is already in binutils
development CVS.  I don't think there has been any release including it yet.
As far as I know, the gcc work has not been committed anywhere and I don't
really know how complete it is.  Richard and I have it as a back-burner
task (him implementing the compiler/assembler parts and me implementing
the consumer code and testing the compiler's results), but it's not really
scheduled work with a delivery target at this point.  

Thanks,
Roland

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-22 22:09                                               ` Roland McGrath
@ 2005-10-24  6:33                                                 ` Mathieu Lacage
  2005-10-24 19:48                                                   ` Roland McGrath
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Lacage @ 2005-10-24  6:33 UTC (permalink / raw)
  To: Roland McGrath; +Cc: systemtap

hi roland,

On Sat, 2005-10-22 at 15:09 -0700, Roland McGrath wrote:
> The work is still in progress.  The assembler work is already in binutils
> development CVS.  I don't think there has been any release including it yet.
> As far as I know, the gcc work has not been committed anywhere and I don't
> really know how complete it is.  Richard and I have it as a back-burner

Out of curiosity, I looked at the dwarf2 line info spec to have an idea
of what this bb stuff looks like. Am I right in assuming that the only
information given is the fact that various source code lines constitute
the start of a basic block ?

Here is a simple example:

while (bar != x) {do_foo (); bar++}

i.e. here, you have something like this:
start:
cmp bar, x
jne end
call do_foo
inc bar
ja start
end:

bb1:
start:
cmp bar, x
jne end;

bb2:
call do_foo

bb3:
inc bar
ja start

The only thing the dwarf2 info would represent is the fact that the
source code line represents the start of bb1 and would not make any
mention of bb2 or bb3. Am I right ?

Mathieu
-- 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-24  6:33                                                 ` Mathieu Lacage
@ 2005-10-24 19:48                                                   ` Roland McGrath
  0 siblings, 0 replies; 83+ messages in thread
From: Roland McGrath @ 2005-10-24 19:48 UTC (permalink / raw)
  To: Mathieu Lacage; +Cc: systemtap

It is not the case that there is always only one DWARF line record for each
source line number, if that is what you mean.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
       [not found]                                             ` <43621B0D.70204@sophia.inria.fr>
@ 2005-11-07 10:04                                               ` mathieu lacage
  2005-11-07 10:06                                                 ` mathieu lacage
  0 siblings, 1 reply; 83+ messages in thread
From: mathieu lacage @ 2005-11-07 10:04 UTC (permalink / raw)
  To: mathieu lacage; +Cc: systemtap

hi,

I was interested enough that I hacked together the attached patch (I 
used gcc svn HEAD and binutils cvs HEAD). It seems to work nicely with 
-O0 but it seems to break on the simple testcase below with -O2/-O3 on 
x86. i.e., it reports a bb boundary at 0x11 and I cannot see why there 
would be a bb boundary there.

Also, I noticed that if you want to use this code for djprobe insertion 
location verification, you will need to parse the resulting binary 
anyway to detect the call sites which are not really bb boundaries and 
are thus not reported by the patch as bb boundaries but which are 
forbidden boundary locations for djprobe insertion.

I will do a lot more testing in a few days but I thought it might be 
useful to send an early report.

regards,
Mathieu

#include <stdio.h>
static int foo (void)
{
        if (3) {
                int i = 0;
                while (i < 100) {
                        printf ("test\n");
                        i++;
                }
        }
        return 8;
}

int main (int argc, char *argv[])
{
        foo ();
        return 0;
}

Here, I get the following list of basic blocks with the debugging 
information:
ad: 0x0
ad: 0x11
ad: 0x20
ad: 0x32
while the assembly output does not seem to contain any jump to 0x11:
00000000 <main>:
   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
   4:   83 e4 f0                and    $0xfffffff0,%esp
   7:   ff 71 fc                pushl  0xfffffffc(%ecx)
   a:   55                      push   %ebp
   b:   89 e5                   mov    %esp,%ebp
   d:   53                      push   %ebx
   e:   31 db                   xor    %ebx,%ebx
  10:   51                      push   %ecx
  11:   83 ec 10                sub    $0x10,%esp
  14:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
  1a:   8d bf 00 00 00 00       lea    0x0(%edi),%edi
  20:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  27:   43                      inc    %ebx
  28:   e8 fc ff ff ff          call   29 <main+0x29>
  2d:   83 fb 64                cmp    $0x64,%ebx
  30:   75 ee                   jne    20 <main+0x20>
  32:   83 c4 10                add    $0x10,%esp
  35:   31 c0                   xor    %eax,%eax
  37:   59                      pop    %ecx
  38:   5b                      pop    %ebx
  39:   5d                      pop    %ebp
  3a:   8d 61 fc                lea    0xfffffffc(%ecx),%esp
  3d:   c3                      ret



>> Richard Henderson has done some work in both gcc and the assembler to 
>> emit
>> DWARF basic block markers.  The compiler knows what the jump targets 
>> are in
>> code it generates.  For inline assembly, the assembler supports 
>> emitting a
>>   
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-11-07 10:04                                               ` mathieu lacage
@ 2005-11-07 10:06                                                 ` mathieu lacage
  0 siblings, 0 replies; 83+ messages in thread
From: mathieu lacage @ 2005-11-07 10:06 UTC (permalink / raw)
  To: systemtap

[-- Attachment #1: Type: text/plain, Size: 2897 bytes --]

of course, it is better with the patch really attached

sorry,
Mathieu

mathieu lacage wrote:

> hi,
>
> I was interested enough that I hacked together the attached patch (I 
> used gcc svn HEAD and binutils cvs HEAD). It seems to work nicely with 
> -O0 but it seems to break on the simple testcase below with -O2/-O3 on 
> x86. i.e., it reports a bb boundary at 0x11 and I cannot see why there 
> would be a bb boundary there.
>
> Also, I noticed that if you want to use this code for djprobe 
> insertion location verification, you will need to parse the resulting 
> binary anyway to detect the call sites which are not really bb 
> boundaries and are thus not reported by the patch as bb boundaries but 
> which are forbidden boundary locations for djprobe insertion.
>
> I will do a lot more testing in a few days but I thought it might be 
> useful to send an early report.
>
> regards,
> Mathieu
>
> #include <stdio.h>
> static int foo (void)
> {
>        if (3) {
>                int i = 0;
>                while (i < 100) {
>                        printf ("test\n");
>                        i++;
>                }
>        }
>        return 8;
> }
>
> int main (int argc, char *argv[])
> {
>        foo ();
>        return 0;
> }
>
> Here, I get the following list of basic blocks with the debugging 
> information:
> ad: 0x0
> ad: 0x11
> ad: 0x20
> ad: 0x32
> while the assembly output does not seem to contain any jump to 0x11:
> 00000000 <main>:
>   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
>   4:   83 e4 f0                and    $0xfffffff0,%esp
>   7:   ff 71 fc                pushl  0xfffffffc(%ecx)
>   a:   55                      push   %ebp
>   b:   89 e5                   mov    %esp,%ebp
>   d:   53                      push   %ebx
>   e:   31 db                   xor    %ebx,%ebx
>  10:   51                      push   %ecx
>  11:   83 ec 10                sub    $0x10,%esp
>  14:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
>  1a:   8d bf 00 00 00 00       lea    0x0(%edi),%edi
>  20:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
>  27:   43                      inc    %ebx
>  28:   e8 fc ff ff ff          call   29 <main+0x29>
>  2d:   83 fb 64                cmp    $0x64,%ebx
>  30:   75 ee                   jne    20 <main+0x20>
>  32:   83 c4 10                add    $0x10,%esp
>  35:   31 c0                   xor    %eax,%eax
>  37:   59                      pop    %ecx
>  38:   5b                      pop    %ebx
>  39:   5d                      pop    %ebp
>  3a:   8d 61 fc                lea    0xfffffffc(%ecx),%esp
>  3d:   c3                      ret
>
>
>
>>> Richard Henderson has done some work in both gcc and the assembler 
>>> to emit
>>> DWARF basic block markers.  The compiler knows what the jump targets 
>>> are in
>>> code it generates.  For inline assembly, the assembler supports 
>>> emitting a
>>>   
>>
>>
>


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 4733 bytes --]

Index: gcc/final.c
===================================================================
--- gcc/final.c	(revision 106485)
+++ gcc/final.c	(working copy)
@@ -129,6 +129,8 @@
 static rtx debug_insn;
 rtx current_output_insn;
 
+int current_start_basic_block = 0;
+
 /* Line number of last NOTE.  */
 static int last_linenum;
 
@@ -1744,6 +1746,7 @@
 	  else
 	    *seen |= SEEN_BB;
 
+	  current_start_basic_block = 1;
 	  break;
 
 	case NOTE_INSN_EH_REGION_BEG:
@@ -2071,8 +2074,21 @@
 	   note in a row.  */
 	if (notice_source_line (insn))
 	  {
-	    (*debug_hooks->source_line) (last_linenum, last_filename);
+	    if (current_start_basic_block)
+	      {
+		current_start_basic_block = 0;
+		(*debug_hooks->source_line) (last_linenum, last_filename, LINE_FLAG_BASIC_BLOCK);
+	      }
+	    else 
+	      {
+		(*debug_hooks->source_line) (last_linenum, last_filename, 0);
+	      }
 	  }
+	else if (current_start_basic_block)
+	  {
+	    current_start_basic_block = 0;
+	    (*debug_hooks->source_line) (last_linenum, last_filename, LINE_FLAG_BASIC_BLOCK);
+	  }
 
 	if (GET_CODE (body) == ASM_INPUT)
 	  {
@@ -2498,6 +2514,7 @@
 	current_output_insn = debug_insn = 0;
       }
     }
+
   return NEXT_INSN (insn);
 }
 \f
Index: gcc/debug.c
===================================================================
--- gcc/debug.c	(revision 106485)
+++ gcc/debug.c	(working copy)
@@ -33,7 +33,7 @@
   debug_nothing_int_int,	         /* begin_block */
   debug_nothing_int_int,	         /* end_block */
   debug_true_tree,		         /* ignore_block */
-  debug_nothing_int_charstar,	         /* source_line */
+  debug_nothing_int_charstar_int,	 /* source_line */
   debug_nothing_int_charstar,	         /* begin_prologue */
   debug_nothing_int_charstar,	         /* end_prologue */
   debug_nothing_int_charstar,	         /* end_epilogue */
@@ -94,6 +94,13 @@
 }
 
 void
+debug_nothing_int_charstar_int (unsigned int line ATTRIBUTE_UNUSED,
+				const char *text ATTRIBUTE_UNUSED,
+				unsigned int flags ATTRIBUTE_UNUSED)
+{
+}
+
+void
 debug_nothing_int (unsigned int line ATTRIBUTE_UNUSED)
 {
 }
Index: gcc/debug.h
===================================================================
--- gcc/debug.h	(revision 106485)
+++ gcc/debug.h	(working copy)
@@ -59,7 +59,7 @@
   bool (* ignore_block) (tree);
 
   /* Record a source file location at (FILE, LINE).  */
-  void (* source_line) (unsigned int line, const char *file);
+  void (* source_line) (unsigned int line, const char *file, unsigned int flags);
 
   /* Called at start of prologue code.  LINE is the first line in the
      function.  This has been given the same prototype as source_line,
@@ -129,12 +129,16 @@
   int start_end_main_source_file;
 };
 
+
+#define LINE_FLAG_BASIC_BLOCK ((unsigned int)1)
+
 extern const struct gcc_debug_hooks *debug_hooks;
 
 /* The do-nothing hooks.  */
 extern void debug_nothing_void (void);
 extern void debug_nothing_charstar (const char *);
 extern void debug_nothing_int_charstar (unsigned int, const char *);
+extern void debug_nothing_int_charstar_int (unsigned int, const char *, unsigned int flags);
 extern void debug_nothing_int (unsigned int);
 extern void debug_nothing_int_int (unsigned int, unsigned int);
 extern void debug_nothing_tree (tree);
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	(revision 106485)
+++ gcc/dwarf2out.c	(working copy)
@@ -69,7 +69,7 @@
 #include "input.h"
 
 #ifdef DWARF2_DEBUGGING_INFO
-static void dwarf2out_source_line (unsigned int, const char *);
+static void dwarf2out_source_line (unsigned int, const char *, unsigned int flags);
 #endif
 
 /* DWARF2 Abbreviation Glossary:
@@ -2510,7 +2510,7 @@
      prologue case, not the eh frame case.  */
 #ifdef DWARF2_DEBUGGING_INFO
   if (file)
-    dwarf2out_source_line (line, file);
+    dwarf2out_source_line (line, file, 0);
 #endif
 }
 
@@ -13534,7 +13534,7 @@
    'line_info_table' for later output of the .debug_line section.  */
 
 static void
-dwarf2out_source_line (unsigned int line, const char *filename)
+dwarf2out_source_line (unsigned int line, const char *filename, unsigned int flags)
 {
   if (debug_info_level >= DINFO_LEVEL_NORMAL
       && line != 0)
@@ -13553,7 +13553,14 @@
 	  file_num = maybe_emit_file (file_num);
 
 	  /* Emit the .loc directive understood by GNU as.  */
-	  fprintf (asm_out_file, "\t.loc %d %d 0\n", file_num, line);
+	  if (flags & LINE_FLAG_BASIC_BLOCK) 
+	    {
+	      fprintf (asm_out_file, "\t.loc %d %d 0 basic_block\n", file_num, line);
+	    }
+	  else
+	    {
+	      fprintf (asm_out_file, "\t.loc %d %d 0 ;#test\n", file_num, line);
+	    }
 
 	  /* Indicate that line number info exists.  */
 	  line_info_table_in_use++;

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-10-08 23:24                                           ` Roland McGrath
  2005-10-22 11:49                                             ` mathieu lacage
       [not found]                                             ` <43621B0D.70204@sophia.inria.fr>
@ 2005-11-08  9:49                                             ` Richard J Moore
  2 siblings, 0 replies; 83+ messages in thread
From: Richard J Moore @ 2005-11-08  9:49 UTC (permalink / raw)
  To: Roland McGrath; +Cc: systemtap

OK, that's fine as long as we have an option of force the interrupt form of
probe so that we can handle odd cases where necessary.

- -
Richard J Moore
IBM Linux Technology Centre

             Roland McGrath                                                
             <roland@redhat.                                               
             com>                                                       To 
                                      Richard J Moore/UK/IBM@IBMGB         
             09/10/2005                                                 cc 
             00:23                    systemtap@sources.redhat.com         
                                                                       bcc 

                                                                   Subject 
                                      Re: Hitachi djprobe mechanism        

Richard Henderson has done some work in both gcc and the assembler to emit
DWARF basic block markers.  The compiler knows what the jump targets are in
code it generates.  For inline assembly, the assembler supports emitting a
marker for every assembler label, and the compiler emits the assembler
directive to enable the assembler's automatic markers around the inline
assembly code it copies out of your `asm' statements.  When the tools with
this support are available (not yet), this will cover everything we see in
the kernel, and most user applications built in normal ways, when they are
built with DWARF info.

The basic block markers are in the standard DWARF format specification.
(They are an optional part of the same encoding that provides source line
number information.)  Other compilers that, like gcc, have not heretofore
implemented this, can do so just as well as we can.

Of course hand-crafted or specially-generated code could always use
computed jumps to locations not marked with assembler labels.  It just
seems very unlikely.  In the case of the kernel, we can be sure enough what
the set of code is and know that there isn't any funny business beyond the
ken of our automatic tools.  For the most general worst case, probably one
does have to be so conservative that you can only use jmp insertion on
unusually large instructions.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-05 16:25       ` Mathieu Desnoyers
@ 2005-08-05 16:39         ` Andi Kleen
  0 siblings, 0 replies; 83+ messages in thread
From: Andi Kleen @ 2005-08-05 16:39 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Andi Kleen, systemtap

> No practical solution has been elaborated for a fully preemptible kernel. Even
> walking on the talk list checking for potential iret pointers is not a good
> idea: If you have a stopped process that has its instruction pointer exactly in
> the wrong area, then you may wait forever.

You only look at runnable processes of course.

> Per cpu workqueue to make sure none is still in interrupt context would, I
> think, do the same as a non maskable IPI but without the busy looping.

Non maskable IPI? What for?

Somehow I have the impression you're trying to do things in the most
complicated imaginable way. But that's not how things in Linux
are done normally.

-Andi

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-04 10:01     ` Andi Kleen
@ 2005-08-05 16:25       ` Mathieu Desnoyers
  2005-08-05 16:39         ` Andi Kleen
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-08-05 16:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: systemtap

* Andi Kleen (ak@suse.de) wrote:
> On Wed, Aug 03, 2005 at 08:26:56PM -0400, Mathieu Desnoyers wrote:
> > * Roland McGrath (roland@redhat.com) wrote:
> > > It's OK for probe insertion to be slow.  So why not use RCU to synchronize
> > > other processors?
> > > 
> > 
> > No,
> > 
> > The reason is we have no control on preemption disabling around the concerned
> > area.
> 
> Task list walking and checking of the IP of blocked processes can avoid that
> as earlier discussed. But it doesn't solve all problems with short instructions.
> IPI is the most practical one.
> 
> 

No practical solution has been elaborated for a fully preemptible kernel. Even
walking on the talk list checking for potential iret pointers is not a good
idea: If you have a stopped process that has its instruction pointer exactly in
the wrong area, then you may wait forever.

Per cpu workqueue to make sure none is still in interrupt context would, I
think, do the same as a non maskable IPI but without the busy looping.


Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-04  0:28   ` Mathieu Desnoyers
@ 2005-08-04 10:01     ` Andi Kleen
  2005-08-05 16:25       ` Mathieu Desnoyers
  0 siblings, 1 reply; 83+ messages in thread
From: Andi Kleen @ 2005-08-04 10:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Roland McGrath, Keshavamurthy, Anil S, Andi Kleen,
	Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Richard J Moore, systemtap, sugita, Satoshi Oshima,
	michel.dagenais

On Wed, Aug 03, 2005 at 08:26:56PM -0400, Mathieu Desnoyers wrote:
> * Roland McGrath (roland@redhat.com) wrote:
> > It's OK for probe insertion to be slow.  So why not use RCU to synchronize
> > other processors?
> > 
> 
> No,
> 
> The reason is we have no control on preemption disabling around the concerned
> area.

Task list walking and checking of the IP of blocked processes can avoid that
as earlier discussed. But it doesn't solve all problems with short instructions.
IPI is the most practical one.

-Andi (who thinks this discussion is running in circles now) 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 20:31 ` Roland McGrath
@ 2005-08-04  0:28   ` Mathieu Desnoyers
  2005-08-04 10:01     ` Andi Kleen
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-08-04  0:28 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Keshavamurthy, Anil S, Andi Kleen, Karim Yaghmour,
	Masami Hiramatsu, Masami Hiramatsu, Richard J Moore, systemtap,
	sugita, Satoshi Oshima, michel.dagenais

* Roland McGrath (roland@redhat.com) wrote:
> It's OK for probe insertion to be slow.  So why not use RCU to synchronize
> other processors?
> 

No,

The reason is we have no control on preemption disabling around the concerned
area.

It makes no sense to wait for a period of time that guarantee that all other
processors will have scheduled to something else when, in fact, it is possible
to schedule in and out from the critical section.


Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 13:18     ` Mathieu Desnoyers
@ 2005-08-02  7:07       ` Mathieu Lacage
  0 siblings, 0 replies; 83+ messages in thread
From: Mathieu Lacage @ 2005-08-02  7:07 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: systemtap

On Mon, 2005-08-01 at 09:17 -0400, Mathieu Desnoyers wrote:

[snip]

> > If I read the djprobe documentation well and if I assume that
> > inserting/removing the probe can be done safely, independently of how
> > many bytes I overwrite in the source function, the rules, for now, are
> > rather simple.

[snip]

> If you follow the discussions on the system tap mailing list, you will find out
> that any instruction smaller that 5 bytes is a bad thing to overwrite.
> (interrupts and preemption problems, as well as cpu instruction cache coherency)
> Some of those cases (interrupts and instruction cache coherency) only shows on
> SMP machines (assuming the overwriting code would return in the modified path
> through an interruption on UP, which is plausible).

I specifically stated above that I assumed that inserting and removing
the probe was a problem solved. Maybe this assumption is just not
workable (even with a very high runtime cost) without assuming also
overwritten instruction lengths bigger than 5 bytes. I have no idea but
I am must say that I am not really interested in the process of
inserting and removing the probe. I am merely trying to figure out what
the other constraints on probe location look like.

Right now, I think the only constraint you have for the placement of
probes is that you need to insert the probe in a basic block which is
bigger than 5 bytes. This should solve the problem raised by karim:

        if (...)
                goto label;
        <more code>
        single_byte_asm_instruction_code();
label:
        foo();

The problem boils down to calculating the basic blocks for a function
and then calculating whether or not this basic block is large enough to
insert a jump in it (even if you want to probe the end of the bb, you
can insert the probe at the start of the bb because, by definition, all
instructions in the bb will execute if any of them executes)

Of course, it should be quite possible to work around this limitation by
performing relocation on multiple basic blocks (as suggested by Karim in
one of his emails) but I am not sure the complexity of doing this would
really gain much. 

The real question is: "how many basic blocks in a program are smaller
than 5 bytes ?" and I suspect that the answer will look much better than
the statistics reported by Karim on the size of instructions. I will
send a small script here which evaluates this asap.

regards,
Mathieu
-- 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-02  3:21 ` Roland McGrath
@ 2005-08-02  3:35   ` Karim Yaghmour
  0 siblings, 0 replies; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-02  3:35 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Keshavamurthy, Anil S, Mathieu Desnoyers, Andi Kleen,
	Masami Hiramatsu, Masami Hiramatsu, Richard J Moore, systemtap,
	sugita, Satoshi Oshima, michel.dagenais


Roland McGrath wrote:
> At this point, you know that no CPU's PC can get into the probe-insertion
> area without hitting the int3.  There is no danger of "half baked"
> instruction decoding because any CPU getting there hits the breakpoint and
> enters an explicit synchronization path through kprobes infrastructure code.  
> A CPU that hits this breakpoint can either wait for the probe inserter to
> finish, or it could just handle it in kprobes style and move on if the
> instruction following the one copied by kprobes is outside the mutation area.
> 
> You store the remaining bytes of the probe jmp instruction.  Then store the
> first byte, replacing the int3.  Then let any synchronized CPUs continue;
> they could either resume kprobe-style processing, or back up the PC and
> restart to allow the new probe-inserted jmp to happen.

Possibly works if you're operating on instructions of 5 bytes or more only.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: Hitachi djprobe mechanism
  2005-08-01 22:41 Keshavamurthy, Anil S
@ 2005-08-02  3:21 ` Roland McGrath
  2005-08-02  3:35   ` Karim Yaghmour
  0 siblings, 1 reply; 83+ messages in thread
From: Roland McGrath @ 2005-08-02  3:21 UTC (permalink / raw)
  To: Keshavamurthy, Anil S
  Cc: Mathieu Desnoyers, Andi Kleen, Karim Yaghmour, Masami Hiramatsu,
	Masami Hiramatsu, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

> Yes, slow is okay as long as you are *not* taking the full 
> system down during probe inserting/removal time. Halting full 
> system during probe insertion/removal might not be acceptable 
> at least on IA64.

This particular discussion has been entirely about picayune details of x86
instruction encoding (and x86-64).  So specific issues about other
architectures are really not apropos.  At any rate, I did not suggest
anything involving full-stop.

If I gleaned correctly from the discussion, the CPU bugs do not affect
single-byte int3 insertions.  We can presume they either don't or will be
fixed, since kprobes and all debuggers rely on that already.

What I was suggesting is as follows.  (This may well be exactly what
kerninst already did, it's been a while since I looked at those papers.)
This does not attempt to address the CONFIG_PREEMPT case.

Insert int3.  Use RCU to synchronize that all CPUs are executing after this
insertion.  This need not hold lots of locks or eat CPUs to synchronize, it
can just cause the probe inserter to sleep until RCU completes or suchlike.

At this point, you know that no CPU's PC can get into the probe-insertion
area without hitting the int3.  There is no danger of "half baked"
instruction decoding because any CPU getting there hits the breakpoint and
enters an explicit synchronization path through kprobes infrastructure code.  
A CPU that hits this breakpoint can either wait for the probe inserter to
finish, or it could just handle it in kprobes style and move on if the
instruction following the one copied by kprobes is outside the mutation area.

You store the remaining bytes of the probe jmp instruction.  Then store the
first byte, replacing the int3.  Then let any synchronized CPUs continue;
they could either resume kprobe-style processing, or back up the PC and
restart to allow the new probe-inserted jmp to happen.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 23:05 ` Karim Yaghmour
@ 2005-08-01 23:18   ` Karim Yaghmour
  0 siblings, 0 replies; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-01 23:18 UTC (permalink / raw)
  To: karim
  Cc: Keshavamurthy, Anil S, Satoshi Oshima, Richard J Moore,
	systemtap, Andi Kleen, Mathieu Desnoyers, Masami Hiramatsu,
	Masami Hiramatsu, michel.dagenais, Roland McGrath, sugita


err ...

Karim Yaghmour wrote:
> figure out whether something on a stack is a pointer or an address.

"data or an address" is what I meant ...

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 22:49 Keshavamurthy, Anil S
@ 2005-08-01 23:05 ` Karim Yaghmour
  2005-08-01 23:18   ` Karim Yaghmour
  0 siblings, 1 reply; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-01 23:05 UTC (permalink / raw)
  To: Keshavamurthy, Anil S
  Cc: Satoshi Oshima, Richard J Moore, systemtap, Andi Kleen,
	Mathieu Desnoyers, Masami Hiramatsu, Masami Hiramatsu,
	michel.dagenais, Roland McGrath, sugita


Keshavamurthy, Anil S wrote:
> Can be done provided you take care of all the issues that has been
> discussed on this mailing list.
> Let's all wait for Hitachi's djprobe patch to show up before we can
> comment further on this topic.

I wish it were that simple. In as far as I can see, based on the
answers I have received, it seems that the implications of some
of the assumptions being made have not been seriously thought through.

If nothing else, I just don't see how one can claim to be able to
figure out whether something on a stack is a pointer or an address.

A patch won't change things like that.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: Hitachi djprobe mechanism
@ 2005-08-01 22:49 Keshavamurthy, Anil S
  2005-08-01 23:05 ` Karim Yaghmour
  0 siblings, 1 reply; 83+ messages in thread
From: Keshavamurthy, Anil S @ 2005-08-01 22:49 UTC (permalink / raw)
  To: karim
  Cc: Satoshi Oshima, Richard J Moore, systemtap, Andi Kleen,
	Mathieu Desnoyers, Masami Hiramatsu, Masami Hiramatsu,
	michel.dagenais, Roland McGrath, sugita

>Does this mean that you think we could use djprobe on anything less
>than 5 bytes?
Can be done provided you take care of all the issues that has been
discussed on this mailing list.
Let's all wait for Hitachi's djprobe patch to show up before we can
comment further on this topic.

Cheers,
-Anil

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: Hitachi djprobe mechanism
@ 2005-08-01 22:41 Keshavamurthy, Anil S
  2005-08-02  3:21 ` Roland McGrath
  0 siblings, 1 reply; 83+ messages in thread
From: Keshavamurthy, Anil S @ 2005-08-01 22:41 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Mathieu Desnoyers, Andi Kleen, Karim Yaghmour, Masami Hiramatsu,
	Masami Hiramatsu, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

>It's OK for probe insertion to be slow. 

Yes, slow is okay as long as you are *not* taking the full 
system down during probe inserting/removal time. Halting full 
system during probe insertion/removal might not be acceptable 
at least on IA64.

>So why not use RCU to 
>synchronize
>other processors?

The thing here is while we are trying to modify the section of code say 
couple of instructions to accommodate 5 bytes(jmp inst), during this
period 
we don't want any cpu to be in the middle of this area and to be
*really* 
sure they(other CPU's) don't get half backed instruction, we must 
place all of the other CPU's in a known location. So in this scenario, 
I doubt how  RCU synchronization can help. 

If I am wrong please educate me.

Cheers,
-Anil

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 20:46 Keshavamurthy, Anil S
@ 2005-08-01 21:08 ` Karim Yaghmour
  0 siblings, 0 replies; 83+ messages in thread
From: Karim Yaghmour @ 2005-08-01 21:08 UTC (permalink / raw)
  To: Keshavamurthy, Anil S
  Cc: Satoshi Oshima, Richard J Moore, systemtap, Andi Kleen,
	Mathieu Desnoyers, Masami Hiramatsu, Masami Hiramatsu,
	michel.dagenais, Roland McGrath, sugita


Keshavamurthy, Anil S wrote:
> So in effect, we just can't look for instruction size greater than 5
> bytes and insert probe there. 

True. This is why I was saying "... that's if there aren't other
limitations..."

> This djprobe will push the need for a stronger static
> analysizer/translator in selecting the probe point.

Does this mean that you think we could use djprobe on anything less
than 5 bytes?

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: Hitachi djprobe mechanism
@ 2005-08-01 20:46 Keshavamurthy, Anil S
  2005-08-01 21:08 ` Karim Yaghmour
  0 siblings, 1 reply; 83+ messages in thread
From: Keshavamurthy, Anil S @ 2005-08-01 20:46 UTC (permalink / raw)
  To: karim, Satoshi Oshima
  Cc: Richard J Moore, systemtap, Andi Kleen, Mathieu Desnoyers,
	Masami Hiramatsu, Masami Hiramatsu, michel.dagenais,
	Roland McGrath, sugita

>In as far as I can see, it remains that the only safe way to 
>use djprobe
>is to not touch any instruction that is less than 5 bytes, that's if
>there aren't other limitations as I mentioned earlier.

Though this is the safe way to insert djprobe, this might not always
serve the desired purpose.

Say for example, user is interested to find how many times a function
gets called and he need 
to insert a probe at the beginning of a function. Due to the nature of
djprobe
1) we might not find a 5 byte instruction with in this function, Or
2) Even if we find one such instruction, that instruction might tend to 
not get executed (due to nature of the code flow), then in this case
If user inserts a probe looking at 5 bytes or more instruction from the
beginning 
of the function address, the results (i.e times a function entered) will
end up being wrong.

So in effect, we just can't look for instruction size greater than 5
bytes and insert probe there. 
This djprobe will push the need for a stronger static
analysizer/translator in selecting the probe point.

-thanks,
Anil

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: Hitachi djprobe mechanism
  2005-08-01 16:14 Keshavamurthy, Anil S
@ 2005-08-01 20:31 ` Roland McGrath
  2005-08-04  0:28   ` Mathieu Desnoyers
  0 siblings, 1 reply; 83+ messages in thread
From: Roland McGrath @ 2005-08-01 20:31 UTC (permalink / raw)
  To: Keshavamurthy, Anil S
  Cc: Mathieu Desnoyers, Andi Kleen, Karim Yaghmour, Masami Hiramatsu,
	Masami Hiramatsu, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

It's OK for probe insertion to be slow.  So why not use RCU to synchronize
other processors?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: Hitachi djprobe mechanism
@ 2005-08-01 16:14 Keshavamurthy, Anil S
  2005-08-01 20:31 ` Roland McGrath
  0 siblings, 1 reply; 83+ messages in thread
From: Keshavamurthy, Anil S @ 2005-08-01 16:14 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

>
>* Keshavamurthy, Anil S (anil.s.keshavamurthy@intel.com) wrote:
>> Andi and others,
>> 	Sending an IPI to each other CPU's (all but self) and make *spin
>> on a lock* during the modification will *freeze* the system. 
>Please do
>> not *spin* inside an IPI.
>> 
>> My observation:
>> Here is what I had discovered, CPU2 had taken an
>> read_lock(&tasklist_lock) and CPU had entered IPI and is now 
>busy *spin
>> on a lock*.
>> CPU3 had called write_lock_irq(&tasklist_lock) where CPU3 
>first disables
>> the local irq and disables preemption and then is trying to 
>> acquire the lock which is already taken by CPU2 and since CPU2 never
>> releases this lock as it is busy spin wait, CPU3 never enters IPI :-(
>> 
>
>Yep, I see the problem : you cannot control other locks that 
>would have been
>taken by other CPUs with interrupts disabled.
>
>Is there any way to send a non-maskable IPI ? This could solve 
>this problem.


The only way I can think of is to use stop_machine_run(fn, data, cpu)
which freezes the machine 
on all cpu's and runs fn() on cpu which is what we want. 
This is slower than an IPI way but definetly very safe compared to IPI.

The only drawback is this is a very heavy weight operation and not sure
its impact on a busy production system.

Thanks,
-Anil





-Anil


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01 15:50 Keshavamurthy, Anil S
@ 2005-08-01 16:03 ` Mathieu Desnoyers
  0 siblings, 0 replies; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-08-01 16:03 UTC (permalink / raw)
  To: Keshavamurthy, Anil S
  Cc: Andi Kleen, Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

* Keshavamurthy, Anil S (anil.s.keshavamurthy@intel.com) wrote:
> Andi and others,
> 	Sending an IPI to each other CPU's (all but self) and make *spin
> on a lock* during the modification will *freeze* the system. Please do
> not *spin* inside an IPI.
> 
> My observation:
> Here is what I had discovered, CPU2 had taken an
> read_lock(&tasklist_lock) and CPU had entered IPI and is now busy *spin
> on a lock*.
> CPU3 had called write_lock_irq(&tasklist_lock) where CPU3 first disables
> the local irq and disables preemption and then is trying to 
> acquire the lock which is already taken by CPU2 and since CPU2 never
> releases this lock as it is busy spin wait, CPU3 never enters IPI :-(
> 

Yep, I see the problem : you cannot control other locks that would have been
taken by other CPUs with interrupts disabled.

Is there any way to send a non-maskable IPI ? This could solve this problem.


Mathieu

> Cheers,
> -Anil
> 
> 
> 
> 
> 
> >-----Original Message-----
> >From: systemtap-owner@sources.redhat.com 
> >[mailto:systemtap-owner@sources.redhat.com] On Behalf Of Andi Kleen
> >Sent: Monday, August 01, 2005 8:38 AM
> >To: Mathieu Desnoyers
> >Cc: Andi Kleen; Karim Yaghmour; Masami Hiramatsu; Masami 
> >Hiramatsu; Roland McGrath; Richard J Moore; 
> >systemtap@sources.redhat.com; sugita@sdl.hitachi.co.jp; 
> >Satoshi Oshima; michel.dagenais@polymtl.ca
> >Subject: Re: Hitachi djprobe mechanism
> >
> >On Sun, Jul 31, 2005 at 06:59:41PM -0400, Mathieu Desnoyers wrote:
> >> * Andi Kleen (ak@suse.de) wrote:
> >> > 
> >> > One way would be to just search the task list for any 
> >tasks blocked with an IP
> >> > inside the patched region. If yes rewait for another 
> >quiescent period.
> >> > 
> >> > 
> >> 
> >> If you stop other cpus'scheduler when you do that, then it's ok.
> >
> >You don't need to stop them, a snapshot of the task list is enough
> >since you only care about preempted sleeping processes at a single 
> >point of time.
> >
> >Anyways, this discussion is theoretic because the IPI approach
> >is probably better.
> >
> >> 
> >> I just though about an interesting way to implement the IPI, 
> >which would work
> >> very well (and safely) for any case where the instruction to 
> >overwrite is >= 5
> >> bytes. The idea :
> >> 
> >> - Send IPI to each other cpu
> >>   IP args : * address we plan to write to
> >>             * the new instruction we plan to write
> >>   (The IPI handler could then make an infinite loop, reading 
> >the address,
> >>   waiting for it to contain the new instruction.)
> >
> >Seems far too complicated, just make it spin on a lock during 
> >the modification.
> >
> >
> >-Andi
> >
> 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: Hitachi djprobe mechanism
@ 2005-08-01 15:50 Keshavamurthy, Anil S
  2005-08-01 16:03 ` Mathieu Desnoyers
  0 siblings, 1 reply; 83+ messages in thread
From: Keshavamurthy, Anil S @ 2005-08-01 15:50 UTC (permalink / raw)
  To: Andi Kleen, Mathieu Desnoyers
  Cc: Karim Yaghmour, Masami Hiramatsu, Masami Hiramatsu,
	Roland McGrath, Richard J Moore, systemtap, sugita,
	Satoshi Oshima, michel.dagenais

Andi and others,
	Sending an IPI to each other CPU's (all but self) and make *spin
on a lock* during the modification will *freeze* the system. Please do
not *spin* inside an IPI.

My observation:
Here is what I had discovered, CPU2 had taken an
read_lock(&tasklist_lock) and CPU had entered IPI and is now busy *spin
on a lock*.
CPU3 had called write_lock_irq(&tasklist_lock) where CPU3 first disables
the local irq and disables preemption and then is trying to 
acquire the lock which is already taken by CPU2 and since CPU2 never
releases this lock as it is busy spin wait, CPU3 never enters IPI :-(

Cheers,
-Anil





>-----Original Message-----
>From: systemtap-owner@sources.redhat.com 
>[mailto:systemtap-owner@sources.redhat.com] On Behalf Of Andi Kleen
>Sent: Monday, August 01, 2005 8:38 AM
>To: Mathieu Desnoyers
>Cc: Andi Kleen; Karim Yaghmour; Masami Hiramatsu; Masami 
>Hiramatsu; Roland McGrath; Richard J Moore; 
>systemtap@sources.redhat.com; sugita@sdl.hitachi.co.jp; 
>Satoshi Oshima; michel.dagenais@polymtl.ca
>Subject: Re: Hitachi djprobe mechanism
>
>On Sun, Jul 31, 2005 at 06:59:41PM -0400, Mathieu Desnoyers wrote:
>> * Andi Kleen (ak@suse.de) wrote:
>> > 
>> > One way would be to just search the task list for any 
>tasks blocked with an IP
>> > inside the patched region. If yes rewait for another 
>quiescent period.
>> > 
>> > 
>> 
>> If you stop other cpus'scheduler when you do that, then it's ok.
>
>You don't need to stop them, a snapshot of the task list is enough
>since you only care about preempted sleeping processes at a single 
>point of time.
>
>Anyways, this discussion is theoretic because the IPI approach
>is probably better.
>
>> 
>> I just though about an interesting way to implement the IPI, 
>which would work
>> very well (and safely) for any case where the instruction to 
>overwrite is >= 5
>> bytes. The idea :
>> 
>> - Send IPI to each other cpu
>>   IP args : * address we plan to write to
>>             * the new instruction we plan to write
>>   (The IPI handler could then make an infinite loop, reading 
>the address,
>>   waiting for it to contain the new instruction.)
>
>Seems far too complicated, just make it spin on a lock during 
>the modification.
>
>
>-Andi
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-08-01  9:02   ` Mathieu Lacage
@ 2005-08-01 13:18     ` Mathieu Desnoyers
  2005-08-02  7:07       ` Mathieu Lacage
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-08-01 13:18 UTC (permalink / raw)
  To: Mathieu Lacage; +Cc: Frank Ch. Eigler, systemtap

* Mathieu Lacage (Mathieu.Lacage@sophia.inria.fr) wrote:
> On Thu, 2005-07-28 at 21:53 -0400, Frank Ch. Eigler wrote:
> > But that would render the facility nearly powerless.  Let us try
> > harder to characterize those cases where it can safely used as an int3
> > substitute.
> 
> If I read the djprobe documentation well and if I assume that
> inserting/removing the probe can be done safely, independently of how
> many bytes I overwrite in the source function, the rules, for now, are
> rather simple.
> 
> Let's say you want to insert probe at location x. If there is no
> relative jmp or indirect call or ret instruction in [x,x+5], you can
> insert the probe at location x.
> 

If you follow the discussions on the system tap mailing list, you will find out
that any instruction smaller that 5 bytes is a bad thing to overwrite.
(interrupts and preemption problems, as well as cpu instruction cache coherency)
Some of those cases (interrupts and instruction cache coherency) only shows on
SMP machines (assuming the overwriting code would return in the modified path
through an interruption on UP, which is plausible).

> The kerninst papers explain how to avoid the constraint on the "relative
> jmp" by relocating it in the allocated instruction buffer and I fail to
> see an obvious flaw in it so, I assume it would work if there is a need
> to optimize this case.
> 
> I have probably missed other cases. Would someone who knows a lot more
> about this fill in the missing rules so that I can do a more interesting
> statistical analysis of the binaries on my system than simply counting
> the number of instructions bigger than 5 ?
> 

I would tend to say that it seems difficult for now to overwrite instructions <=
5 bytes. Or maybe someone has a genial idea ?


Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  1:53 ` Frank Ch. Eigler
@ 2005-08-01  9:02   ` Mathieu Lacage
  2005-08-01 13:18     ` Mathieu Desnoyers
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Lacage @ 2005-08-01  9:02 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Thu, 2005-07-28 at 21:53 -0400, Frank Ch. Eigler wrote:
> But that would render the facility nearly powerless.  Let us try
> harder to characterize those cases where it can safely used as an int3
> substitute.

If I read the djprobe documentation well and if I assume that
inserting/removing the probe can be done safely, independently of how
many bytes I overwrite in the source function, the rules, for now, are
rather simple.

Let's say you want to insert probe at location x. If there is no
relative jmp or indirect call or ret instruction in [x,x+5], you can
insert the probe at location x.

The kerninst papers explain how to avoid the constraint on the "relative
jmp" by relocating it in the allocated instruction buffer and I fail to
see an obvious flaw in it so, I assume it would work if there is a need
to optimize this case.

I have probably missed other cases. Would someone who knows a lot more
about this fill in the missing rules so that I can do a more interesting
statistical analysis of the binaries on my system than simply counting
the number of instructions bigger than 5 ?

Mathieu
-- 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  3:41   ` Mathieu Desnoyers
@ 2005-07-29  3:47     ` Karim Yaghmour
  0 siblings, 0 replies; 83+ messages in thread
From: Karim Yaghmour @ 2005-07-29  3:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Keshavamurthy, Anil S, Richard J Moore, Masami Hiramatsu,
	Masami Hiramatsu, michel.dagenais, Roland McGrath,
	Satoshi Oshima, sugita, systemtap


Mathieu Desnoyers wrote:
> Here is what I've found :
> http://www.linuxshowcase.org/2000/2000papers/papers/moore/moore_html/
> 
> Yes, it seems like the right way to do it : making the final code change an
> atomic operation.

Well actually I'm not looking for Richard's papers, I've seen those before :)
Rather, I'm looking for where these projects are currently hosted.

Note that I'm not looking at taking the entire thing (there is quite a
bit of history of debates around all those components and I really wish
to concentrate at the single issue that's at hand: markers.)

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  1:48 ` Karim Yaghmour
@ 2005-07-29  3:41   ` Mathieu Desnoyers
  2005-07-29  3:47     ` Karim Yaghmour
  0 siblings, 1 reply; 83+ messages in thread
From: Mathieu Desnoyers @ 2005-07-29  3:41 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Keshavamurthy, Anil S, Richard J Moore, Masami Hiramatsu,
	Masami Hiramatsu, michel.dagenais, Roland McGrath,
	Satoshi Oshima, sugita, systemtap

* Karim Yaghmour (karim@opersys.com) wrote:
> 
> Now if only I could find the pages where IBM now hosts its projects ...
> there are a lot of broken pointers when trying to access stuff like
> kernel hooks, dprobes, kprobes, etc. Google, at least, can't find
> anything meaningfull.
> 

Here is what I've found :
http://www.linuxshowcase.org/2000/2000papers/papers/moore/moore_html/

Yes, it seems like the right way to do it : making the final code change an
atomic operation.


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  0:18 Keshavamurthy, Anil S
  2005-07-29  1:48 ` Karim Yaghmour
@ 2005-07-29  1:53 ` Frank Ch. Eigler
  2005-08-01  9:02   ` Mathieu Lacage
  1 sibling, 1 reply; 83+ messages in thread
From: Frank Ch. Eigler @ 2005-07-29  1:53 UTC (permalink / raw)
  To: Keshavamurthy, Anil S; +Cc: systemtap


> [...] The only _limitations_ here is that djprobe can only be placed
> if there is a static hook as mentioned above.

But that would render the facility nearly powerless.  Let us try
harder to characterize those cases where it can safely used as an int3
substitute.

- FChE

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-29  0:18 Keshavamurthy, Anil S
@ 2005-07-29  1:48 ` Karim Yaghmour
  2005-07-29  3:41   ` Mathieu Desnoyers
  2005-07-29  1:53 ` Frank Ch. Eigler
  1 sibling, 1 reply; 83+ messages in thread
From: Karim Yaghmour @ 2005-07-29  1:48 UTC (permalink / raw)
  To: Keshavamurthy, Anil S
  Cc: Richard J Moore, Mathieu Desnoyers, Masami Hiramatsu,
	Masami Hiramatsu, michel.dagenais, Roland McGrath,
	Satoshi Oshima, sugita, systemtap

Keshavamurthy, Anil S wrote:
> Yup, I agree with you and this seems to the correct way to support
> djprobe 
> with having to worry about all the other issues which we have discussed
> earlier. 
> The only _limitations_ here is that djprobe can only be placed if there
> is a static hook as mentioned above.

Actually given that kernel hooks has been around for a very long time, and
since static probe points are required anyway, I'm thinking of actually
reusing as much from kernel hooks as possible to implement the dynamic
instrumentation component of the kernel markers.

Now if only I could find the pages where IBM now hosts its projects ...
there are a lot of broken pointers when trying to access stuff like
kernel hooks, dprobes, kprobes, etc. Google, at least, can't find
anything meaningfull.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: Hitachi djprobe mechanism
@ 2005-07-29  0:18 Keshavamurthy, Anil S
  2005-07-29  1:48 ` Karim Yaghmour
  2005-07-29  1:53 ` Frank Ch. Eigler
  0 siblings, 2 replies; 83+ messages in thread
From: Keshavamurthy, Anil S @ 2005-07-29  0:18 UTC (permalink / raw)
  To: Richard J Moore
  Cc: Mathieu Desnoyers, Masami Hiramatsu, Karim Yaghmour,
	Masami Hiramatsu, michel.dagenais, Roland McGrath,
	Satoshi Oshima, sugita, systemtap

 
>
>There are more efficient ways of implementing a jmp type hook - see the
>kernel hooks package, where we evloved past this string of 5 no-ops
>implementation Here we moved an immediate value - 1 byte - 
>into a reg and
>jumped on the reg being non-zero. To spring the hook we stored the one
>immediate byte in the mov instruction. This technique works 
>quite well on
>IA64 where one can use a predicate register for the same purpose.

Yup, I agree with you and this seems to the correct way to support
djprobe 
with having to worry about all the other issues which we have discussed
earlier. 
The only _limitations_ here is that djprobe can only be placed if there
is a static hook as mentioned above.

-thanks,
Anil


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-26  7:53     ` Roland McGrath
@ 2005-07-27 13:02       ` Masami Hiramatsu
  0 siblings, 0 replies; 83+ messages in thread
From: Masami Hiramatsu @ 2005-07-27 13:02 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Richard J Moore, SystemTAP, sugita, Satoshi Oshima

Hi, Roland

Roland McGrath wrote:
>>  I think Kerninst is similar in effect to djprobe. both of them copy
>>original code to a buffer and jump to the buffer.
>>  However I think that the most unique feature of djprobe is use of
>>"bypass" route to safely insert code on SMP.
>>  I cannot find SMP safety mechanism like "bypass" in kerninst papers
>>yet.
> 
> 
> If by this you mean inserting an int3 while writing the rest of the jmp
> instruction and then overwriting the first byte when the rest is in place,
> I recall reading about that in some kerninst paper to be sure.

Thanks a lot.
Finally, I found it in page.9 of the OSDI paper:
"Fine-Grained Dynamic Instrumentation of Commodity Operating System Kernels",
Ariel Tamches and Barton P. Miller, OSDI, Feb 1999.

Actually, it seems to describe a similar thing.

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: hiramatu@sdl.hitachi.co.jp



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-26  7:14   ` Masami Hiramatsu
@ 2005-07-26  7:53     ` Roland McGrath
  2005-07-27 13:02       ` Masami Hiramatsu
  0 siblings, 1 reply; 83+ messages in thread
From: Roland McGrath @ 2005-07-26  7:53 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Richard J Moore, SystemTAP, sugita, Satoshi Oshima

>   I think Kerninst is similar in effect to djprobe. both of them copy
> original code to a buffer and jump to the buffer.
>   However I think that the most unique feature of djprobe is use of
> "bypass" route to safely insert code on SMP.
>   I cannot find SMP safety mechanism like "bypass" in kerninst papers
> yet.

If by this you mean inserting an int3 while writing the rest of the jmp
instruction and then overwriting the first byte when the rest is in place,
I recall reading about that in some kerninst paper to be sure.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-21 22:52 ` Roland McGrath
  2005-07-22  2:52   ` Richard J Moore
@ 2005-07-26  7:14   ` Masami Hiramatsu
  2005-07-26  7:53     ` Roland McGrath
  1 sibling, 1 reply; 83+ messages in thread
From: Masami Hiramatsu @ 2005-07-26  7:14 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Richard J Moore, SystemTAP, sugita, Satoshi Oshima

Hi, Roland

Roland McGrath wrote:
> They posted here, and the basic techniques are the same as the published
> "kerninst" work.

  I think Kerninst is similar in effect to djprobe. both of them copy
original code to a buffer and jump to the buffer.
  However I think that the most unique feature of djprobe is use of
"bypass" route to safely insert code on SMP.
  I cannot find SMP safety mechanism like "bypass" in kerninst papers
yet.

  Please let me know  if you knew it.

best regards,

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: hiramatu@sdl.hitachi.co.jp

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
@ 2005-07-22 18:09 Frank Ch. Eigler
  0 siblings, 0 replies; 83+ messages in thread
From: Frank Ch. Eigler @ 2005-07-22 18:09 UTC (permalink / raw)
  To: systemtap

Hi -

richardj_moore wrote:

> Oh they did. Did anyone have any thoughts on their mechanism?

I have nothing but encouragement toward the effort.  We should exploit
the facility as soon and as far as it is safely applicable.

- FChE

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-21 22:52 ` Roland McGrath
@ 2005-07-22  2:52   ` Richard J Moore
  2005-07-26  7:14   ` Masami Hiramatsu
  1 sibling, 0 replies; 83+ messages in thread
From: Richard J Moore @ 2005-07-22  2:52 UTC (permalink / raw)
  To: systemtap





Oh they did. Did anyone have any thoughts on their mechanism?
- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072


                                                                           
             Roland McGrath                                                
             <roland@redhat.                                               
             com>                                                       To 
             Sent by:                   Richard J Moore/UK/IBM@IBMGB       
             systemtap-owner                                            cc 
             @sources.redhat            SystemTAP                          
             .com                       <systemtap@sources.redhat.com>     
                                                                       bcc 
                                                                           
             21/07/2005                                            Subject 
             23:51                      Re: Hitachi djprobe mechanism      
                                                                           
                                                                           




They posted here, and the basic techniques are the same as the published
"kerninst" work.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Hitachi djprobe mechanism
  2005-07-21 22:32 Richard J Moore
@ 2005-07-21 22:52 ` Roland McGrath
  2005-07-22  2:52   ` Richard J Moore
  2005-07-26  7:14   ` Masami Hiramatsu
  0 siblings, 2 replies; 83+ messages in thread
From: Roland McGrath @ 2005-07-21 22:52 UTC (permalink / raw)
  To: Richard J Moore; +Cc: SystemTAP

They posted here, and the basic techniques are the same as the published
"kerninst" work.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Hitachi djprobe mechanism
@ 2005-07-21 22:32 Richard J Moore
  2005-07-21 22:52 ` Roland McGrath
  0 siblings, 1 reply; 83+ messages in thread
From: Richard J Moore @ 2005-07-21 22:32 UTC (permalink / raw)
  To: SystemTAP





The guys from Hitachi at OLS have just shown me an interesting performance
innovation based on krpobes. For high performance probing they use an
inserted jmp. This can't be installed atomically, so to get around that
problem they use a kprobe on the first instance then insert the jmp. Sounds
interesting. They are sending me the details.
- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2005-11-08  9:49 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-27 21:05 Hitachi djprobe mechanism Keshavamurthy, Anil S
2005-07-28  1:51 ` Karim Yaghmour
2005-07-28  2:10   ` Karim Yaghmour
2005-07-28 16:23     ` Masami Hiramatsu
2005-07-28 16:28       ` Karim Yaghmour
2005-07-28 17:36         ` Mathieu Desnoyers
     [not found]           ` <20050728110717.A30199@unix-os.sc.intel.com>
2005-07-28 18:33             ` Mathieu Desnoyers
     [not found]               ` <20050728133456.A32210@unix-os.sc.intel.com>
2005-07-28 23:53                 ` Richard J Moore
2005-07-29  5:59                 ` Mathieu Desnoyers
2005-07-29  7:55                   ` Andi Kleen
2005-07-29  8:44                     ` Richard J Moore
2005-07-29  8:46                       ` Andi Kleen
2005-07-29 15:51                     ` Mathieu Desnoyers
2005-07-30 15:55                       ` Andi Kleen
2005-07-30 16:54                         ` Mathieu Desnoyers
2005-07-31 22:03                           ` Andi Kleen
2005-07-31 23:11                             ` Mathieu Desnoyers
2005-08-01 15:37                               ` Andi Kleen
2005-08-01  8:44                             ` Richard J Moore
2005-08-01 13:21                               ` Mathieu Desnoyers
2005-08-01 19:57                               ` Satoshi Oshima
2005-08-01 20:21                                 ` Karim Yaghmour
2005-08-01 22:12                                   ` Satoshi Oshima
2005-08-01 22:54                                     ` Karim Yaghmour
2005-08-02 18:42                                       ` Satoshi Oshima
2005-08-03 14:50                                         ` Karim Yaghmour
2005-08-04  1:19                                         ` Mathieu Desnoyers
2005-08-04  3:31                                           ` Mathieu Desnoyers
2005-08-02  9:42                                   ` Mathieu Lacage
2005-08-02 15:09                                     ` Karim Yaghmour
2005-10-07 15:35                                     ` Richard J Moore
2005-10-08 18:33                                       ` mathieu lacage
2005-10-08 21:59                                         ` Richard J Moore
2005-10-08 23:24                                           ` Roland McGrath
2005-10-22 11:49                                             ` mathieu lacage
2005-10-22 22:09                                               ` Roland McGrath
2005-10-24  6:33                                                 ` Mathieu Lacage
2005-10-24 19:48                                                   ` Roland McGrath
     [not found]                                             ` <43621B0D.70204@sophia.inria.fr>
2005-11-07 10:04                                               ` mathieu lacage
2005-11-07 10:06                                                 ` mathieu lacage
2005-11-08  9:49                                             ` Richard J Moore
2005-10-09 16:47                                           ` mathieu lacage
2005-08-02 15:33                                   ` Mathieu Lacage
2005-08-02 15:36                                     ` Mathieu Lacage
2005-08-02 16:12                                     ` Karim Yaghmour
2005-08-02 16:30                                       ` Mathieu Lacage
2005-08-02 16:46                                         ` Karim Yaghmour
2005-08-04 17:09                                         ` Mathieu Lacage
2005-08-03 14:46                                 ` Andi Kleen
2005-07-29 16:06                   ` Frank Ch. Eigler
2005-07-29 18:24                     ` sugita
2005-07-28 18:13       ` Richard J Moore
  -- strict thread matches above, loose matches on Subject: below --
2005-08-01 22:49 Keshavamurthy, Anil S
2005-08-01 23:05 ` Karim Yaghmour
2005-08-01 23:18   ` Karim Yaghmour
2005-08-01 22:41 Keshavamurthy, Anil S
2005-08-02  3:21 ` Roland McGrath
2005-08-02  3:35   ` Karim Yaghmour
2005-08-01 20:46 Keshavamurthy, Anil S
2005-08-01 21:08 ` Karim Yaghmour
2005-08-01 16:14 Keshavamurthy, Anil S
2005-08-01 20:31 ` Roland McGrath
2005-08-04  0:28   ` Mathieu Desnoyers
2005-08-04 10:01     ` Andi Kleen
2005-08-05 16:25       ` Mathieu Desnoyers
2005-08-05 16:39         ` Andi Kleen
2005-08-01 15:50 Keshavamurthy, Anil S
2005-08-01 16:03 ` Mathieu Desnoyers
2005-07-29  0:18 Keshavamurthy, Anil S
2005-07-29  1:48 ` Karim Yaghmour
2005-07-29  3:41   ` Mathieu Desnoyers
2005-07-29  3:47     ` Karim Yaghmour
2005-07-29  1:53 ` Frank Ch. Eigler
2005-08-01  9:02   ` Mathieu Lacage
2005-08-01 13:18     ` Mathieu Desnoyers
2005-08-02  7:07       ` Mathieu Lacage
2005-07-22 18:09 Frank Ch. Eigler
2005-07-21 22:32 Richard J Moore
2005-07-21 22:52 ` Roland McGrath
2005-07-22  2:52   ` Richard J Moore
2005-07-26  7:14   ` Masami Hiramatsu
2005-07-26  7:53     ` Roland McGrath
2005-07-27 13:02       ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).