djprobes status

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* djprobes status
@ 2006-09-14 18:46 Frank Ch. Eigler
  2006-09-15 13:49 ` Satoshi Oshima
  0 siblings, 1 reply; 23+ messages in thread
From: Frank Ch. Eigler @ 2006-09-14 18:46 UTC (permalink / raw)
  To: systemtap; +Cc: Ingo Molnar

Hi -

Ingo asked me to find out the current status of djprobes-like
technology in kprobes, and a list of the major technical problems that
blocked further progress.  Could someone who worked on that summarize?
Thanks!

- FChE

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-14 18:46 djprobes status Frank Ch. Eigler
@ 2006-09-15 13:49 ` Satoshi Oshima
  2006-09-15 16:50   ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Satoshi Oshima @ 2006-09-15 13:49 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap, Ingo Molnar

Hi Frank,

We are continuing to work on djprobes.

The problems currently we are trying to solve is...

(1)finding the safe place to put probes
(2)unregistration of the buffer that is used as out of line execution
   under kernel level preemption(CONFIG_PREEMPT=y)

We have some idea to solve problem (1), and we are working on it.

Currently (2) is very difficult to solve. This problem is equivalent
to the problem of finding RCU quiescent state without rcu_read_lock()/
rcu_read_unlock(). We need to add a couple of primitives to find such
state. But it is not easy. We may be going to push djprobe without 
kernel level preemption support like current kprobe-booster.

Satoshi OSHIMA

Frank Ch. Eigler wrote:
> Hi -
> 
> Ingo asked me to find out the current status of djprobes-like
> technology in kprobes, and a list of the major technical problems that
> blocked further progress.  Could someone who worked on that summarize?
> Thanks!
> 
> - FChE

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 13:49 ` Satoshi Oshima
@ 2006-09-15 16:50   ` Ingo Molnar
  2006-09-15 18:43     ` Satoshi Oshima
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2006-09-15 16:50 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap

On Fri, 2006-09-15 at 09:49 -0400, Satoshi Oshima wrote:
> The problems currently we are trying to solve is...
> 
> (1)finding the safe place to put probes
> (2)unregistration of the buffer that is used as out of line execution
>    under kernel level preemption(CONFIG_PREEMPT=y)
> 
> We have some idea to solve problem (1), and we are working on it.
> 
> Currently (2) is very difficult to solve. This problem is equivalent
> to the problem of finding RCU quiescent state without rcu_read_lock()/
> rcu_read_unlock(). We need to add a couple of primitives to find such
> state. But it is not easy. We may be going to push djprobe without 
> kernel level preemption support like current kprobe-booster.
> 

could you tell me a bit more about how you have tried to solve (2) and
why it didnt work? I cannot see any fundamental difficulty in (2), in
fact i find problem (1) much, much scarier! :-) (please also outline the
basic problem to me, maybe i'm not seeing the difficulties out of
ignorance.)

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 16:50   ` Ingo Molnar
@ 2006-09-15 18:43     ` Satoshi Oshima
  2006-09-15 18:58       ` Ingo Molnar
  2006-09-15 19:10       ` Eugeniy Meshcheryakov
  0 siblings, 2 replies; 23+ messages in thread
From: Satoshi Oshima @ 2006-09-15 18:43 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

[-- Attachment #1: Type: text/plain, Size: 3222 bytes --]

Hi Ingo,

Ingo Molnar wrote:
> On Fri, 2006-09-15 at 09:49 -0400, Satoshi Oshima wrote:
>> The problems currently we are trying to solve is...
>>
>> (1)finding the safe place to put probes
>> (2)unregistration of the buffer that is used as out of line execution
>>    under kernel level preemption(CONFIG_PREEMPT=y)
>>
>> We have some idea to solve problem (1), and we are working on it.
>>
>> Currently (2) is very difficult to solve. This problem is equivalent
>> to the problem of finding RCU quiescent state without rcu_read_lock()/
>> rcu_read_unlock(). We need to add a couple of primitives to find such
>> state. But it is not easy. We may be going to push djprobe without 
>> kernel level preemption support like current kprobe-booster.
>>
> 
> could you tell me a bit more about how you have tried to solve (2) and
> why it didnt work? I cannot see any fundamental difficulty in (2), in
> fact i find problem (1) much, much scarier! :-) (please also outline the
> basic problem to me, maybe i'm not seeing the difficulties out of
> ignorance.)

I try to explain two problems above and attach djprobe document
if you are interested in the detail of djprobes. And I appreciate
your help, suggestion or comment!

(1)finding the safe place to put probes

Imagine certain binary line which is constructed by 2 byte instruction, 
2byte instruction and 3byte instruction, like below

         IA
         | 
[-2][-1][0][1][2][3][4][5][6][7]
        [ins1][ins2][  ins3 ]
        [<-     DCR       ->]
           [<- JTPR ->]

ins1: 1st Instruction
ins2: 2nd Instruction
ins3: 3rd Instruction
IA:  Insertion Address
JTPR: Jump Target Prohibition Region
DCR: Detoured Code Region

If we try to put a probe onto IA, we copy all instruction in DCR to
the buffer that is in executable pages and replace 5 byte in DCR
with relative jmp instruction. The problem (1) is the case that
some other part of the kernel might try to jump or call into JTPR. 

Yes, this is scary. But basically, we have to look into all jmp,
call, etc..instructions in the same function. I believe that
there is no possibilities that someone jump into the middle of the
function from other functions. The exception is assembler code.

(2)unregistration of the buffer that is used as out of line execution
   under kernel level preemption(CONFIG_PREEMPT=y)

Djprobes copy all instruction in DCR to the buffer and add another
jmp instruction after that. In the case of unregistration, djprobes
put break point instruction and put original instruction after the
safety check. After that djprobes have to release the buffer. 

The problem is that it is very difficult to find the status that
the buffer is not in use, because someone preempt the execution
in the buffer and preempted thread is in expired and that thread
might be re-balanced to another CPU queue. 

This is very similar to RCU quiescent status. In the case of
RCU, user must make rcu_read_lock()/unlock() when being in
read critical region. rcu_read_lock()/unlock() is implemented
as preempt_disable()/enable().

For djprobe, it is unavailable to take such a strategy, because 
there is no space to put preempt_enable() after jmp instruction
in the buffer or in original code.

Satoshi

[-- Attachment #2: djprobe-20051031.txt --]
[-- Type: text/plain, Size: 14216 bytes --]

Djprobe Documentation
authors: Satoshi Oshima (soshima@redhat.com)
         Masami Hiramatsu (hiramatu@sdl.hitachi.co.jp)

INDEX

1. Djprobe concepts
2. How djprobe works
3. Further Considerations
4. Djprobe Features
5. Architectures Supported
6. Configuring Djprobe
7. API Reference 
8. TODO
9. FAQ

1. Djprobe concepts

The basic idea of Djprobe is to dynamically hook at any kernel function 
entry points and collect the debugging or performance analysis information 
non-disruptively. The functionality of djprobe is very similar to Kprobe 
or Jprobe. The distinction of djprobe is to use jump instruction instead 
of break point instruction. This distinction reduces the overhead of each 
probe.

Developers can trap at almost any kernel function entry points, specifying 
a handler routine to be invoked when the jump instruction is executed.

2. How Djprobe works

Break point instruction is easily inserted on most architecture.
For example, binary size of break point instruction on i386 or x86_64 
architecture is 1 byte. 1 byte replacement is took place in single step.
And replacement with breakpoint instruction is guaranteed as SMP safe.

On the other hand jump instruction is not easily inserted. Binary size of 
jump instruction on i386 is 5 byte. 5 byte replacement cannot be executed 
in single step. And beyond that dynamic code modification has some 
complicated restriction.

To explain the djprobe mechanism, we introduce some terminology.
Image certain binary line which is constructed by 2 byte instruction, 
2byte instruction and 3byte instruction.

         IA
         | 
[-2][-1][0][1][2][3][4][5][6][7]
        [ins1][ins2][  ins3 ]
        [<-     DCR       ->]
           [<- JTPR ->]

ins1: 1st Instruction
ins2: 2nd Instruction
ins3: 3rd Instruction
IA:  Insertion Address
JTPR: Jump Target Prohibition Region
DCR: Detoured Code Region

The replacement procedure of djpopbes is 6 steps:

(1) copying instruction(s) in DCR
(2) putting break point instruction at IA
(3) scheduling works on each CPU
(4) executing CPU safety check on each work
(5) replacing original instruction(s) with jump instruction without
first byte and serializing code
(6) replacing break point instruction with first byte of jump instruction

Further explanation is given below.

(1) copying instruction(s) in DCR

Djprobe copies replaced instruction(s) to the region that djprobe allocates.
The replaced instructions must include the instruction that includes the byte 
at IA+4. Therefore the size of DCR must be 5 byte or more. The size of DCR 
must be given by djprobes user.

(2) putting break point instruction at IA

Djprobe replaces a break point instruction at Insertion Point. After this 
replacement, the djprobe act like kprobes.

(3) scheduling works on each CPU

Djprobe schedules work(s) that execute CPU safety check on each CPU, and wait 
till those works finished.

(4) executing CPU safety check on each work

Current Djprobe suppose that the context switch must NOT occur on extension 
of interruption, which means that every interruption must return before 
executing context switch.

Therefore, execution of scheduled works itself is the proof that every 
interruption stack (and every process stack) doesn't include any address 
in JTPR.

The last CPU that executes safety check work wakes the waiting process up.

(5) replacing original instruction(s) with jump instruction without first 
byte and serializing code

After all safety check works are scheduled, djprobe can replace the codes 
in JTPR safely. Because, now, any CPU is not executing JTPR. Even if a CPU 
tries to execute the instructions in the top of DCR again, the CPU is 
interrupted by kprobe and is led to execute the copied instructions. So 
any CPU does not touch the instructions in the JTPR.

Djprobe replaces the bytes in the area from IA+1 to IA+4 with jump 
instruction that doesn't contain first byte.
And it serializes the replaced code on every CPU.

(6) replacing breakpoint instruction with first byte of jump instruction

Djprobe replaces breakpoint instruction that is put by themselves with the 
first byte of jump instruction.

3. Further considerations

There are many difficulties on implementation of djprobe. In this section, 
we discuss restrictions of djprobe to understand these difficulties.

3.1 the way to confirm safety of DCR(dynamic analysis)

Djprobe tries to replace the code that includes one instruction or more.
This replacement usually accompanies changing the boundaries of instructions.
Therefore djprobe must ensure that the other CPUs don't execute DCR or every 
stack doesn't contain the address in JTPR.

3.2 confirmation of safety of DCR(static analysis)

Djprobe must also avoid JTPR must not be targeted by any jump or call 
instruction. Basically this must be extremely difficult to take place.
But some point such as function entry point can be expected that is not 
target of jump or call instruction (because function entry point contains 
fixed form that ensures the code convention.)

4. Djprobe Features

- Djprobe can probe entries of almost all functions without any interruption.

5. Architecture Supported

- i386

6. Configuring Djprobe
When configuring the kernel using make menuconfig/xconfig/oldconfig, ensure
that CONFIG_DJPROBE is set to "y".  Under "Instrumentation Support", 
look for "Direct Jump probe". You may have to enable "Kprobes" and to 
*DISABLE* "Preemptible Kernel".

7. API Reference 
The Djprobe API includes "register_djprobe" function and 
"unregister_djprobe" function. Here are specifications for these functions
and the associated probe handlers.

7.1 register_djprobe

#include <linux/djprobe.h>
int register_djprobe(struct djprobe *djp, void *addr, int size);

Inserts a jump instruction at the address addr. When the jump is
hit, Djprobe calls djp->handler.

register_djprobe() returns 0 on success, or a negative errno otherwise.

User's probe handler (djp->handler):
#include <linux/djprobe.h>
#include <linux/ptrace.h>
void handler(struct djprobe *djp, struct pt_regs *regs);

Called with p pointing to the djprobe associated with the probe point,
and regs pointing to the struct containing the registers saved when
the probe point was hit.

7.2 unregister_djprobe

#include <linux/djprobe.h>
void unregister_djprobe(struct djprobe *djp);

Removes the specified probe.  The unregister function can be called
at any time after the probe has been registered.

8. TODO

(1)support architecture transparent interface.
   (Djprobe interface emulated by kprobes)
(2)bulk registeration interface support
(3)kprobe interoperability (coexistance in same address)
(4)other architectures support

9. FAQ
Direct Jump Probe Q&A

Q: What is the Direct Jump Probe (Djprobe)?
A: Djprobe is a low overhead probe method for linux kernel. 

Q: What is different from Kprobes?
A: The most different feature is that the djprobe uses a jump instruction 
code instead of breakpoint instruction code. It can reduce overheads of 
probing especially when the probes are executed frequently.

Q: How does the djprobe work?
A: First, Djprobe copies some instructions modified by a jump instruction 
into the middle of a stub code buffer. Next, it overwrites the instructions 
with the jump instruction whose destination is the top of that stub code 
buffer. In the top of the stub code buffer, there is a call instruction 
which calls a probe function. And, in the bottom of the stub code buffer, 
there is a jump instruction whose destination is the next of the modified 
instructions. 
 On the other hand, Kprobe copies only one instruction which will be 
modified by breakpoint instruction, and overwrites it breakpoint 
instruction. When breakpoint interruption handling, it executes the copied 
instruction with the trap flag. When trap interruption handling, it 
corrects IP(*) for returning to the kernel code.
 So, djprobe's work sequence is "jump", "probe", "execute copies" and 
"jump", whereas kprobes' sequence is "break", "probe", "execute copies", 
and "trap".

(*)Instruction Pointer

Q: Does the djprobe need to modify kernel source code?
A: No. The djprobe is one of the dynamic probes. It can be inserted into 
running kernel.

Q: Can djprobe work with CPU-hotplug?
A: Yes, djprobe locks cpu-hotplug in the critical section.

Q: Where can the djprobe be inserted in?
A: Djprobe can be inserted in almost all kernel code including the head of 
almost kernel functions. The insertion area must satisfy the assumptions 
described below. 

(In i386 architecture)
         IA
         | 
[-2][-1][0][1][2][3][4][5][6][7]
        [ins1][ins2][  ins3 ]
        [<-     DCR       ->]
           [<- JTPR ->]

ins1: 1st Instruction
ins2: 2nd Instruction
ins3: 3rd Instruction
IA:  Insertion Address
DCR (Detoured Code Region): The area which is including the instructions 
whose first byte is in the range in 5 bytes (this size is from the size of 
jump instruction) from the insertion address. These instructions are copied 
into the middle of a stub code buffer.
JTPR (Jump Target Prohibition Region): The area which is including the 
codes among codes rewritten in the jump instruction by djprobe except the 
first one byte.

Assumptions:
i) The insertion address points the first byte of an instruction.
  This is for avoidance of a bad instruction exception.
ii) There are no instructions which refer IP (ex. relative jmp) in DCR.
  EIP has been changed when copied instruction is executed.
iii) There are no instructions which occur context-switch (ex. call 
     schedule()) in DCR.
  If a context-switch occurs in DCR, the next address of an instruction 
 (ex. the address of "ins2") is stored in the call stack of previous thread. 
 After that, djprobe overwrites the instruction with jump instruction. When 
 the previous thread switches back, it resumes execution from the stored 
 address. So it will cause a bad instruction exception.
 iv) Destination address of jump or call is not included in JTPR.
  This is for avoidance of a bad instruction exception too.

Q: Can several djprobes be inserted in the same address?
A: Yes. Several djprobes which are inserted in the same address are 
aggregated and share one instance. 
NOTE: When a new djprobe's insertion address is in another djprobe's JTPR 
(above described), or the another djprobe's insertion address is in the new 
djprobe's JTPR, register_djprobe() fails to register the new djprobe and 
returns -EEXIST error code.

Q: Can djprobe be used with kprobes in same address?
A: No, currently djprobe can not coexist with kprobes in same address. But 
we will support this feature as soon as possible.

Q: Should the jump instruction be with in a page boundary to avoid access 
 violation and page fault?
A: No. The x86 processors can handle non-aligned instructions correctly. We 
can see many non-aligned instructions in the kernel binary. And, in the 
kernel space, there is no page fault. Kernel code pages are always mapped 
to the kernel page table. 
So it is not necessary to care of page boundaries in x86 architecture.

Q: How does the djprobe resolve problems about self/cross-modifying code? 
 In Pentium Series, Unsynchronized cross-modifying code operations except 
 the first byte of an instruction can cause unexpected instruction 
 execution results.
A: Djprobe uses a trick code to resolve the problems. It modifies the 
 instructions as following.
1) Register special handler as a kprobe handler. (And a break point 
  instruction is written on the first byte of the insertion address by 
  kprobes.)
2) Check safety (this is described in the next question's answer).
3) Write only the destination address part of jump instruction on the 
  kernel code. (This operation is not synchronized)
4) Call "cpuid" on each processor for synchronization.
5) Write the first byte of the jump instruction. (This operation is 
  synchronized automatically)

Q: How does the djprobe guarantee no threads and no processors are  
 executing the modifying area? The IP of that area may be stored in the 
 stack memory of those threads.
A: The problem would be caused for three reasons:
 i) Problem caused by the multi processor system
  Another processor may be executing the area which is overwritten by jump 
 instruction. Djprobe should guarantee no processor is executing those 
 instructions when modify it.
 ii) Problem caused by the interruption
  An interruption might have occurred in the area which is going to be 
 overwritten by jump instruction. Djprobe should guarantee all 
 interruptions which occurred in the area have finished.
 iii) Problem caused by full preempt kernel
  In case of Problem (iii), it is described in the next question's answer. 

The Djprobe uses the workqueue to resolve Problem (i) and (ii). The 
solution is described below:
1) Copy the entire of the DCR (described above) into the middle of a stub 
 code buffer.
2) Register special handler as a kprobe handler. This special handler 
 changes kprobe's resume point to the stub code buffer. 
3) Clear the safety flags of all processors.
4) Register a safety checking work to the workqueue on each processor. And 
 wait till those works are scheduled.
5) When keventd thread is scheduled on a processor, it executes the work. 
 In this time, this processor is not executing the area which is 
 overwritten by jump instruction. And also it has finished all 
 interruptions. Because, in the case of voluntary preemption or non 
 preemption kernel, the context switch does not occur in the extension of 
 interruption. 
6) The all works are scheduled, djprobe writes the jump instruction.

Q: Can the djprobe work with kernel full preemption?
A: No, but you can use the djprobe's interface. When kernel full preemption 
 is enabled, we can't ensure that no threads are executing the modified 
 area. It may be stored in the stack of the threads. In this case, the 
 djprobe interfaces are emulated by using kprobe.
 The latest linux kernel supports not only full preemption but also the 
 voluntarily preemption. In the case of voluntarily preemption, threads 
 are scheduled from only limited addresses. So it is easy to check that 
 the preemption can not occur in the modified area.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 18:43     ` Satoshi Oshima
@ 2006-09-15 18:58       ` Ingo Molnar
  2006-09-15 19:07         ` Ingo Molnar
  2006-09-15 19:24         ` Satoshi Oshima
  2006-09-15 19:10       ` Eugeniy Meshcheryakov
  1 sibling, 2 replies; 23+ messages in thread
From: Ingo Molnar @ 2006-09-15 18:58 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

On Fri, 2006-09-15 at 14:43 -0400, Satoshi Oshima wrote:
> For djprobe, it is unavailable to take such a strategy, because 
> there is no space to put preempt_enable() after jmp instruction
> in the buffer or in original code.
> 
thanks for the explanation.

I'm wondering - why dont we fix up the IP of all preempted tasks back to
the restored [formerly destroyed] range?

Similarly, a mirror problem exists when inserting a probe: the matching
IPs of currently preempted tasks have to be fixed up too to point to the
right offset in the DCR buffer.

Am i missing some difficulty? The performance of probe insertion/removal
does not really matter, the critical path is probe execution.

another method to solve this would be to force all tasks in the system
to 'gather', then do the modification and then 'release' them. That is
precisely what sw-suspend/resume already does, and we could reuse that
existing infrastructure: take a look at kernel/power/process.c's
freeze_processes() and thaw_processes() functions. It takes care of
kernel threads too.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 18:58       ` Ingo Molnar
@ 2006-09-15 19:07         ` Ingo Molnar
  2006-09-15 19:12           ` Ingo Molnar
                             ` (2 more replies)
  2006-09-15 19:24         ` Satoshi Oshima
  1 sibling, 3 replies; 23+ messages in thread
From: Ingo Molnar @ 2006-09-15 19:07 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

On Fri, 2006-09-15 at 20:50 +0200, Ingo Molnar wrote:
> Am i missing some difficulty? The performance of probe
> insertion/removal
> does not really matter, the critical path is probe execution.

a third possibility would be to generate not a jump straight into the
trampoline, but a jump to a kprobes-controlled function:

	pushw $target_IP
	ret

this makes the inserted code 6 bytes, but gives full control to kprobes:
you can start a probe in a non-patched function, which would start by
first disabling preemption. It's almost as fast as the direct jump:
modern CPUs are able to optimize such ret sequences just fine.

at the end of the DCR trampoline you'd not jump back straight to the
code, but you'd return-jump to another global kprobes function: which
would re-enable preemption and which would jump back to the next
instruction.

this is all still infinitely faster than INT3 + single-stepping.

this would make probe installation and removal quite simple, at the
expense of some minimal runtime overhead. (10 cycles at most, on modern
CPUs)

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 18:43     ` Satoshi Oshima
  2006-09-15 18:58       ` Ingo Molnar
@ 2006-09-15 19:10       ` Eugeniy Meshcheryakov
  1 sibling, 0 replies; 23+ messages in thread
From: Eugeniy Meshcheryakov @ 2006-09-15 19:10 UTC (permalink / raw)
  To: Satoshi Oshima
  Cc: Ingo Molnar, Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

[-- Attachment #1: Type: text/plain, Size: 1553 bytes --]

15 вересня 2006 о 14:43 -0400 Satoshi Oshima написав(-ла):
> (1)finding the safe place to put probes
> 
> Imagine certain binary line which is constructed by 2 byte instruction, 
> 2byte instruction and 3byte instruction, like below
> 
>          IA
>          | 
> [-2][-1][0][1][2][3][4][5][6][7]
>         [ins1][ins2][  ins3 ]
>         [<-     DCR       ->]
>            [<- JTPR ->]
> 
> ins1: 1st Instruction
> ins2: 2nd Instruction
> ins3: 3rd Instruction
> IA:  Insertion Address
> JTPR: Jump Target Prohibition Region
> DCR: Detoured Code Region
> 
> If we try to put a probe onto IA, we copy all instruction in DCR to
> the buffer that is in executable pages and replace 5 byte in DCR
> with relative jmp instruction. The problem (1) is the case that
> some other part of the kernel might try to jump or call into JTPR. 
> 
> Yes, this is scary. But basically, we have to look into all jmp,
> call, etc..instructions in the same function. I believe that
> there is no possibilities that someone jump into the middle of the
> function from other functions. The exception is assembler code.

I think this can happen with C code too. If we have code like this:

fun1(...) {
  ...
  return fun3(0, ...);
}

fun2(...) {
  return fun3(1, ...);
}

fun3(arg1, ...) {
	if (arg1) {
	   ....
	else {
           ....
	}
}

I do not know how gcc really works, but I think it can optimize code in
such way that fun1 and fun2 will jump to the middle of fun3.

-- 
Eugeniy Meshcheryakov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 19:12           ` Ingo Molnar
@ 2006-09-15 19:12             ` Ingo Molnar
  0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2006-09-15 19:12 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

On Fri, 2006-09-15 at 21:04 +0200, Ingo Molnar wrote:
> a fourth possibility would be to insert a pushf/popf sequence:
> 
> 
>         pushf
>         jmp $tramp_addr
>         popf
> 
> that makes the inserted code 2 bytes longer. A pushf/popf pair is 3-7
> cycles on a modern CPU.

a 'cli' has to be added too - so 3 bytes longer.

	Ingo
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 19:07         ` Ingo Molnar
@ 2006-09-15 19:12           ` Ingo Molnar
  2006-09-15 19:12             ` Ingo Molnar
  2006-09-15 19:52           ` Satoshi Oshima
  2006-09-16  7:17           ` djprobes status Ingo Molnar
  2 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2006-09-15 19:12 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

On Fri, 2006-09-15 at 20:59 +0200, Ingo Molnar wrote:
> a third possibility would be to generate not a jump straight into the
> trampoline, but a jump to a kprobes-controlled function:
> 
a fourth possibility would be to insert a pushf/popf sequence:


	pushf
	jmp $tramp_addr
	popf

that makes the inserted code 2 bytes longer. A pushf/popf pair is 3-7
cycles on a modern CPU.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 18:58       ` Ingo Molnar
  2006-09-15 19:07         ` Ingo Molnar
@ 2006-09-15 19:24         ` Satoshi Oshima
  2006-09-15 22:32           ` Ingo Molnar
  1 sibling, 1 reply; 23+ messages in thread
From: Satoshi Oshima @ 2006-09-15 19:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

Ingo Molnar wrote:
> On Fri, 2006-09-15 at 14:43 -0400, Satoshi Oshima wrote:
>> For djprobe, it is unavailable to take such a strategy, because 
>> there is no space to put preempt_enable() after jmp instruction
>> in the buffer or in original code.
>>
> thanks for the explanation.
> 
> I'm wondering - why dont we fix up the IP of all preempted tasks back to
> the restored [formerly destroyed] range?
> 
> Similarly, a mirror problem exists when inserting a probe: the matching
> IPs of currently preempted tasks have to be fixed up too to point to the
> right offset in the DCR buffer.

Yes, true. So djprobes put int3 first and wait for hit of it for safety.
After that replace int3 with jmp instruction.

> Am i missing some difficulty? The performance of probe insertion/removal
> does not really matter, the critical path is probe execution.
> 
> another method to solve this would be to force all tasks in the system
> to 'gather', then do the modification and then 'release' them. That is
> precisely what sw-suspend/resume already does, and we could reuse that
> existing infrastructure: take a look at kernel/power/process.c's
> freeze_processes() and thaw_processes() functions. It takes care of
> kernel threads too.

We proposed such an idea, but tons of disagreement had come at that
time;-) If you support the plan, it is worth to try it again.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 19:07         ` Ingo Molnar
  2006-09-15 19:12           ` Ingo Molnar
@ 2006-09-15 19:52           ` Satoshi Oshima
  2006-09-15 22:35             ` Ingo Molnar
  2006-09-16  7:17           ` djprobes status Ingo Molnar
  2 siblings, 1 reply; 23+ messages in thread
From: Satoshi Oshima @ 2006-09-15 19:52 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

Ingo Molnar wrote:
> On Fri, 2006-09-15 at 20:50 +0200, Ingo Molnar wrote:
>> Am i missing some difficulty? The performance of probe
>> insertion/removal
>> does not really matter, the critical path is probe execution.
> 
> a third possibility would be to generate not a jump straight into the
> trampoline, but a jump to a kprobes-controlled function:
> 
> 	pushw $target_IP
> 	ret
> 
> this makes the inserted code 6 bytes, but gives full control to kprobes:
> you can start a probe in a non-patched function, which would start by
> first disabling preemption. It's almost as fast as the direct jump:
> modern CPUs are able to optimize such ret sequences just fine.

> at the end of the DCR trampoline you'd not jump back straight to the
> code, but you'd return-jump to another global kprobes function: which
> would re-enable preemption and which would jump back to the next
> instruction.
> 
> this is all still infinitely faster than INT3 + single-stepping.
> 
> this would make probe installation and removal quite simple, at the
> expense of some minimal runtime overhead. (10 cycles at most, on modern
> CPUs)

Great! 

Actually, the implementation will be much more complex because
we have to think about multiple probes and multiple processors.
But basically I can see the solution in your advice.

Thank you for your advice.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 19:24         ` Satoshi Oshima
@ 2006-09-15 22:32           ` Ingo Molnar
  2006-09-15 23:13             ` Satoshi Oshima
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2006-09-15 22:32 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

On Fri, 2006-09-15 at 15:24 -0400, Satoshi Oshima wrote:
> We proposed such an idea, but tons of disagreement had come at that
> time;-) If you support the plan, it is worth to try it again.

i like this because it creates synergy between the sw-suspend code and
kprobes. Shared infrastructure is always good. What type of disagreement
was there, was it on lkml?

i dont like the performance aspect of it though: we'd need to schedule
nr_tasks times to insert probes into the kernel. So i'd consider this an
easy "plan B" approach, to be used if "plan A" fails :-)

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 19:52           ` Satoshi Oshima
@ 2006-09-15 22:35             ` Ingo Molnar
  2006-09-15 23:00               ` Satoshi Oshima
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2006-09-15 22:35 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

On Fri, 2006-09-15 at 15:52 -0400, Satoshi Oshima wrote:
> Actually, the implementation will be much more complex because
> we have to think about multiple probes and multiple processors.
> But basically I can see the solution in your advice. 

there's a fifth option too i can think of: just dont free probe buffers
but put them into a linked list! Once there's too many of them, do the
gather/release thing once, to release a bunch of buffers at once. That
way the overhead of the gather/release will trigger only seldomly.

this is faster than the pushl based one, and much simpler as well i
believe.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 22:35             ` Ingo Molnar
@ 2006-09-15 23:00               ` Satoshi Oshima
  2006-09-16  7:11                 ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Satoshi Oshima @ 2006-09-15 23:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

Ingo Molnar wrote:
> On Fri, 2006-09-15 at 15:52 -0400, Satoshi Oshima wrote:
>> Actually, the implementation will be much more complex because
>> we have to think about multiple probes and multiple processors.
>> But basically I can see the solution in your advice. 
> 
> there's a fifth option too i can think of: just dont free probe buffers
> but put them into a linked list! Once there's too many of them, do the
> gather/release thing once, to release a bunch of buffers at once. That
> way the overhead of the gather/release will trigger only seldomly.
> 
> this is faster than the pushl based one, and much simpler as well i
> believe.

Link list approach still requires to use preempt_disable()/enable() or
freeze_process() because we can not guarantee the safety that all the 
buffer is not preempted. I think it is not enough to put buffers away
to linked list for a long time.

And yes, we will use linked list batch approach for recycle buffers.

Satoshi



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 22:32           ` Ingo Molnar
@ 2006-09-15 23:13             ` Satoshi Oshima
  0 siblings, 0 replies; 23+ messages in thread
From: Satoshi Oshima @ 2006-09-15 23:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

Ingo Molnar wrote:
> On Fri, 2006-09-15 at 15:24 -0400, Satoshi Oshima wrote:
>> We proposed such an idea, but tons of disagreement had come at that
>> time;-) If you support the plan, it is worth to try it again.
> 
> i like this because it creates synergy between the sw-suspend code and
> kprobes. Shared infrastructure is always good. What type of disagreement
> was there, was it on lkml?

We had pushed djprobes to systemtap mailing list first, and lkml once.
And we have talked djprobes idea at OLS and other conferences.
We always recieved disagreement. Even Andrew Morton disagreed this 
approach, If I could remember correctly.

Almost all diagreements were concerned with performance of probe 
insertion.


> i dont like the performance aspect of it though: we'd need to schedule
> nr_tasks times to insert probes into the kernel. So i'd consider this an
> easy "plan B" approach, to be used if "plan A" fails :-)

Agreed. We push "plan A" first.

Satoshi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 23:00               ` Satoshi Oshima
@ 2006-09-16  7:11                 ` Ingo Molnar
  2006-09-20 10:59                   ` Masami Hiramatsu
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2006-09-16  7:11 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

On Fri, 2006-09-15 at 19:00 -0400, Satoshi Oshima wrote:
> > there's a fifth option too i can think of: just dont free probe
> buffers
> > but put them into a linked list! Once there's too many of them, do
> the
> > gather/release thing once, to release a bunch of buffers at once.
> That
> > way the overhead of the gather/release will trigger only seldomly.
> > 
> > this is faster than the pushl based one, and much simpler as well i
> > believe.
> 
> Link list approach still requires to use preempt_disable()/enable() or
> freeze_process() because we can not guarantee the safety that all the 
> buffer is not preempted. I think it is not enough to put buffers away
> to linked list for a long time. 

yes, my suggestion above was to occasionally do (for every 1000 buffers
or 10000 buffers) something like:

	freeze_processes();
	list_for_each_entry(&list)
		release_buffer(buffer);
	thaw_processes();

this way the trampoline code is the simplest possible (and the fastest
possible) as we dont have to add preempt_disable()/enable() to the
trampoline code. Am i missing some important complication?

	Ingo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-15 19:07         ` Ingo Molnar
  2006-09-15 19:12           ` Ingo Molnar
  2006-09-15 19:52           ` Satoshi Oshima
@ 2006-09-16  7:17           ` Ingo Molnar
  2 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2006-09-16  7:17 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Frank Ch. Eigler, systemtap, Masami Hiramatsu, sugita

On Fri, 2006-09-15 at 20:59 +0200, Ingo Molnar wrote:
> a third possibility would be to generate not a jump straight into the
> trampoline, but a jump to a kprobes-controlled function:
> 
>         pushw $target_IP
>         ret 

ok, that's not enough, in this scheme we'd have to do something like:

	pushl $trampoline_offset
	callq $generic_handler

which is 10 bytes - quite large.

[generic_handler() would then do preempt_disable(), and it would call
the function pointer passed to it and then it would set up a
preempt_enable() call on the stack and jump to the trampoline. When the
trampoline does RET, we'd return to straight after the callq above.]

in that sense the freeze_processes()/thaw_processes() approach sounds
robust, because it would keep the "size of destruction" at the minimal 5
bytes.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-16  7:11                 ` Ingo Molnar
@ 2006-09-20 10:59                   ` Masami Hiramatsu
  2006-09-22 21:35                     ` Satoshi Oshima
  2006-09-25 13:03                     ` [RFC][kprobes]Preempt enable kprobe-booster(Re: djprobes status) Masami Hiramatsu
  0 siblings, 2 replies; 23+ messages in thread
From: Masami Hiramatsu @ 2006-09-20 10:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Satoshi Oshima, Frank Ch. Eigler, systemtap, sugita

Hi Ingo,

Ingo Molnar wrote:
> yes, my suggestion above was to occasionally do (for every 1000 buffers
> or 10000 buffers) something like:
> 
> 	freeze_processes();
> 	list_for_each_entry(&list)
> 		release_buffer(buffer);
> 	thaw_processes();
> 
> this way the trampoline code is the simplest possible (and the fastest
> possible) as we dont have to add preempt_disable()/enable() to the
> trampoline code. Am i missing some important complication?

No, you're right. I like this idea because of architecture-
independence and simplicity.

I think this method can be applied to ensure the safety of
kprobe-booster on preemptive kernel. Actually, the kprobe-booster
implementation is very similar to the bottom half of the djprobe.
It executes copied instruction directly and jumps back to the kernel
code, instead of single-stepping.
Thus, if some processes are preempted on the copied instruction,
it cannot release the buffer which contains the copied instruction.
Currently, the kprobe-booster is disabled on the preemptive kernel,
but I'll make the patch to enable it by using these functions and try it!

Thanks a lot!

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-20 10:59                   ` Masami Hiramatsu
@ 2006-09-22 21:35                     ` Satoshi Oshima
  2006-09-25 11:49                       ` Ingo Molnar
  2006-09-25 13:03                     ` [RFC][kprobes]Preempt enable kprobe-booster(Re: djprobes status) Masami Hiramatsu
  1 sibling, 1 reply; 23+ messages in thread
From: Satoshi Oshima @ 2006-09-22 21:35 UTC (permalink / raw)
  To: Masami Hiramatsu, Ingo Molnar; +Cc: Frank Ch. Eigler, systemtap, sugita

Hi Ingo,

I have been thinking how I implement djprobes on your advice.
And I have found some difficulties on some plans.

Before my explanation, I would like to introduce a couple of
rules that djprobes must follow.

(1) On out of line execution of probed code, registers and 
    stack must be the same state that is probed, 
    because probed code may touch the stack or registers. 
    This include flag registers, so we cannot prevent 
    interruption at least during the out of line execution.
    (Or we have to add other restrictions on djprobes users
     such as "don't probe the code that touch stack")

(2) Probes may nest, which means that another probe may
    be hit during interruption or exception handling.

And I explain what prevent me.

Plan A: push $target_IP; ret 
        (or push flag;push $target_IP;iret) method

Djprobes must have multiple probe points and support
multi processor environment. They have to follow rule(2).
So $target_IP must be like $target_IP[$cpuid][$depth_of_nest],
and preempted processes may move to another processor.
It is hard to fixup after out of line execution started.

Plan B: inserting pushfd+cli+jmp+popfd

If we mask interruptions on probe, we have to unmask it
during out of line execution to follow rule(1). And we
have the same problem on plan A again.

Now we have 3 options left.

Plan C: fixup preempted address in the task_struct

Plan D: making safety check on preempted address 
        in the task_struct

Plan E: freeze_processes()/thow_processes()

I realize that Plan C&D is highly archtecture dependent
and Intel CPU's erratta may bring another problem. 
And someone pointed me out that current djprobes 
implementation is very hard to port to some other 
architectures.

So We would like to try Plan E first. The code will be
simpler and mainly architecture independent.

After that, we can try to introduce current djprobes
implementation if performance optimization is required.

Anyway, thank you for your advice and appreciate your
comment.

Satoshi

Masami Hiramatsu wrote:
> Hi Ingo,
> 
> Ingo Molnar wrote:
>> yes, my suggestion above was to occasionally do (for every 1000 buffers
>> or 10000 buffers) something like:
>>
>> 	freeze_processes();
>> 	list_for_each_entry(&list)
>> 		release_buffer(buffer);
>> 	thaw_processes();
>>
>> this way the trampoline code is the simplest possible (and the fastest
>> possible) as we dont have to add preempt_disable()/enable() to the
>> trampoline code. Am i missing some important complication?
> 
> No, you're right. I like this idea because of architecture-
> independence and simplicity.
> 
> I think this method can be applied to ensure the safety of
> kprobe-booster on preemptive kernel. Actually, the kprobe-booster
> implementation is very similar to the bottom half of the djprobe.
> It executes copied instruction directly and jumps back to the kernel
> code, instead of single-stepping.
> Thus, if some processes are preempted on the copied instruction,
> it cannot release the buffer which contains the copied instruction.
> Currently, the kprobe-booster is disabled on the preemptive kernel,
> but I'll make the patch to enable it by using these functions and try it!
> 
> Thanks a lot!
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: djprobes status
  2006-09-22 21:35                     ` Satoshi Oshima
@ 2006-09-25 11:49                       ` Ingo Molnar
  0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2006-09-25 11:49 UTC (permalink / raw)
  To: Satoshi Oshima; +Cc: Masami Hiramatsu, Frank Ch. Eigler, systemtap, sugita

On Fri, 2006-09-22 at 17:34 -0400, Satoshi Oshima wrote:
> Plan E: freeze_processes()/thow_processes()
> 
> 
> I realize that Plan C&D is highly archtecture dependent
> and Intel CPU's erratta may bring another problem. 
> And someone pointed me out that current djprobes 
> implementation is very hard to port to some other 
> architectures.
> 
> So We would like to try Plan E first. The code will be
> simpler and mainly architecture independent. 

agreed. Plan E combined with "batching" would be my preference too,
because it allows the fastest implementation for probes. (probes would
not have to be aware of preemption or code modification at all, because
whatever modification is done to the code it will be in a completely
"quiet" state of the system.) This would speed up normal kprobes too.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC][kprobes]Preempt enable kprobe-booster(Re: djprobes status)
  2006-09-20 10:59                   ` Masami Hiramatsu
  2006-09-22 21:35                     ` Satoshi Oshima
@ 2006-09-25 13:03                     ` Masami Hiramatsu
  2006-09-25 19:00                       ` Keshavamurthy Anil S
  1 sibling, 1 reply; 23+ messages in thread
From: Masami Hiramatsu @ 2006-09-25 13:03 UTC (permalink / raw)
  To: Masami Hiramatsu, Keshavamurthy, Anil S, Ananth N Mavinakayanahalli
  Cc: Ingo Molnar, Satoshi Oshima, Frank Ch. Eigler, systemtap, sugita

Hi,

Here is the patch which enables kprobe-booster on
the preemptive kernel.

When we are unregistering a kprobe-booster, we can't release
its buffer immediately on the preemptive kernel, because
some processes might be preempted on the buffer.
The freeze_processes() and thaw_processes() functions can
clean those processes up from the buffer.
However, the processing of those functions takes long time.
So I introduced a kind of garbage collection(GC) routine into
insn_slots.
Currently, this GC is invoked if 1) there are more than
INSNS_PER_PAGE garbage slots or 2) there are no unused slots.
Would you have any better idea about these timings? I'd like to
receive feedbacks.
Please review it.

Thanks,

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

---
 arch/i386/kernel/kprobes.c |    2 -
 kernel/kprobes.c           |   63 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 57 insertions(+), 8 deletions(-)

Index: linux-2.6.18/kernel/kprobes.c
===================================================================
--- linux-2.6.18.orig/kernel/kprobes.c	2006-09-25 21:44:22.000000000 +0900
+++ linux-2.6.18/kernel/kprobes.c	2006-09-25 21:46:22.000000000 +0900
@@ -37,6 +37,7 @@
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/moduleloader.h>
+#include <linux/sched.h>
 #include <asm-generic/sections.h>
 #include <asm/cacheflush.h>
 #include <asm/errno.h>
@@ -72,9 +73,12 @@
 	kprobe_opcode_t *insns;		/* Page of instruction slots */
 	char slot_used[INSNS_PER_PAGE];
 	int nused;
+	int ngarbage;
 };

 static struct hlist_head kprobe_insn_pages;
+static int kprobe_garbage_slots;
+static void collect_garbage_slots(void);

 /**
  * get_insn_slot() - Find a slot on an executable page for an instruction.
@@ -85,6 +89,7 @@
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos;

+      retry:
 	hlist_for_each(pos, &kprobe_insn_pages) {
 		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
 		if (kip->nused < INSNS_PER_PAGE) {
@@ -101,7 +106,12 @@
 		}
 	}

-	/* All out of space.  Need to allocate a new page. Use slot 0.*/
+	/* If there are any garbage slots, collect it and try again. */
+	if (kprobe_garbage_slots) {
+		collect_garbage_slots();
+		goto retry;
+	}
+	/* All out of space.  Need to allocate a new page. Use slot 0. */
 	kip = kmalloc(sizeof(struct kprobe_insn_page), GFP_KERNEL);
 	if (!kip) {
 		return NULL;
@@ -122,19 +132,29 @@
 	memset(kip->slot_used, 0, INSNS_PER_PAGE);
 	kip->slot_used[0] = 1;
 	kip->nused = 1;
+	kip->ngarbage = 0;
 	return kip->insns;
 }

-void __kprobes free_insn_slot(kprobe_opcode_t *slot)
+static void collect_garbage_slots(void)
 {
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos;

+#if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
+	/* Ensure no-one is preepmted on the garbages */
+	freeze_processes();
+#endif
 	hlist_for_each(pos, &kprobe_insn_pages) {
+		int i;
+	      next:
 		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
-		if (kip->insns <= slot &&
-		    slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
-			int i = (slot - kip->insns) / MAX_INSN_SIZE;
+		if (kip->ngarbage == 0)
+			continue;
+		kip->ngarbage = 0;	/* we will collect all garbages */
+		for (i = 0; i < INSNS_PER_PAGE; i++) {
+			if (kip->slot_used[i] != -1)
+				continue;
 			kip->slot_used[i] = 0;
 			kip->nused--;
 			if (kip->nused == 0) {
@@ -144,19 +164,48 @@
 				 * so as not to have to set it up again the
 				 * next time somebody inserts a probe.
 				 */
+				pos = pos->next;
 				hlist_del(&kip->hlist);
 				if (hlist_empty(&kprobe_insn_pages)) {
 					INIT_HLIST_NODE(&kip->hlist);
 					hlist_add_head(&kip->hlist,
-						&kprobe_insn_pages);
+						       &kprobe_insn_pages);
 				} else {
 					module_free(NULL, kip->insns);
 					kfree(kip);
 				}
+				if (pos == NULL)
+					goto collected;
+				else
+					goto next;
 			}
-			return;
 		}
 	}
+      collected:
+	kprobe_garbage_slots = 0;
+#if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
+	thaw_processes();
+#endif
+}
+
+void __kprobes free_insn_slot(kprobe_opcode_t * slot)
+{
+	struct kprobe_insn_page *kip;
+	struct hlist_node *pos;
+
+	hlist_for_each(pos, &kprobe_insn_pages) {
+		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
+		if (kip->insns <= slot &&
+		    slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
+			kip->slot_used[(slot - kip->insns) / MAX_INSN_SIZE] =
+			    -1;
+			kip->ngarbage++;
+			break;
+		}
+	}
+	if (++kprobe_garbage_slots > INSNS_PER_PAGE) {
+		collect_garbage_slots();
+	}
 }
 #endif

Index: linux-2.6.18/arch/i386/kernel/kprobes.c
===================================================================
--- linux-2.6.18.orig/arch/i386/kernel/kprobes.c	2006-09-25 21:44:22.000000000 +0900
+++ linux-2.6.18/arch/i386/kernel/kprobes.c	2006-09-25 21:44:35.000000000 +0900
@@ -333,7 +333,7 @@
 		return 1;

 ss_probe:
-#ifndef CONFIG_PREEMPT
+#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PM)
 	if (p->ainsn.boostable == 1 && !p->post_handler){
 		/* Boost up -- we can execute copied instructions directly */
 		reset_current_kprobe();

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][kprobes]Preempt enable kprobe-booster(Re: djprobes status)
  2006-09-25 13:03                     ` [RFC][kprobes]Preempt enable kprobe-booster(Re: djprobes status) Masami Hiramatsu
@ 2006-09-25 19:00                       ` Keshavamurthy Anil S
  2006-09-26 13:13                         ` Masami Hiramatsu
  0 siblings, 1 reply; 23+ messages in thread
From: Keshavamurthy Anil S @ 2006-09-25 19:00 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Keshavamurthy, Anil S, Ananth N Mavinakayanahalli, Ingo Molnar,
	Satoshi Oshima, Frank Ch. Eigler, systemtap, sugita

On Mon, Sep 25, 2006 at 06:03:06AM -0700, Masami Hiramatsu wrote:
Masami,
	Overall the code looks good and I just have a very 
minor comments.

> 
>    Currently, this GC is invoked if 1) there are more than
>    INSNS_PER_PAGE garbage slots or 2) there are no unused slots.
>    Would you have any better idea about these timings? I'd like to
>    receive feedbacks.
Invoking GC looks good to me.


>    -void __kprobes free_insn_slot(kprobe_opcode_t *slot)
>    +static void collect_garbage_slots(void)
>     {
>            struct kprobe_insn_page *kip;
>            struct hlist_node *pos;
>    +#if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
>    +       /* Ensure no-one is preepmted on the garbages */
>    +       freeze_processes();
I think you need to check for freeze_processes() return value.
If the return value indicates failure, then you must NOT
continue collecting garbage slots, instead call 
thaw_processes() and return.
>    +#endif

[.....]

>    Index: linux-2.6.18/arch/i386/kernel/kprobes.c
>    ===================================================================
>    ---   linux-2.6.18.orig/arch/i386/kernel/kprobes.c          2006-09-25
>    21:44:22.000000000 +0900
>    +++       linux-2.6.18/arch/i386/kernel/kprobes.c           2006-09-25
>    21:44:35.000000000 +0900
>    @@ -333,7 +333,7 @@
>                    return 1;
>     ss_probe:
>    -#ifndef CONFIG_PREEMPT
>    +#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PM)

Now you are supporting booster probes independent of CONFIG_PREEMPT (i.e
you are supporting booster probes for CONFIG_PREEMPT defined and for NOT defined cases),
so just defined(CONFIG_PM) should be fine I think.

>            if (p->ainsn.boostable == 1 && !p->post_handler){
>                      /*  Boost  up  -- we can execute copied instructions
>    directly */
>                    reset_current_kprobe();


thanks,
Anil Keshavamurthy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][kprobes]Preempt enable kprobe-booster(Re: djprobes status)
  2006-09-25 19:00                       ` Keshavamurthy Anil S
@ 2006-09-26 13:13                         ` Masami Hiramatsu
  0 siblings, 0 replies; 23+ messages in thread
From: Masami Hiramatsu @ 2006-09-26 13:13 UTC (permalink / raw)
  To: Keshavamurthy Anil S
  Cc: Ananth N Mavinakayanahalli, Ingo Molnar, Satoshi Oshima,
	Frank Ch. Eigler, systemtap, sugita

Hi Anil,

Thank you for the advice.

Keshavamurthy Anil S wrote:
> On Mon, Sep 25, 2006 at 06:03:06AM -0700, Masami Hiramatsu wrote:
> Masami,
> 	Overall the code looks good and I just have a very 
> minor comments.
> 
>>    Currently, this GC is invoked if 1) there are more than
>>    INSNS_PER_PAGE garbage slots or 2) there are no unused slots.
>>    Would you have any better idea about these timings? I'd like to
>>    receive feedbacks.
> Invoking GC looks good to me.

Thanks.

>>    -void __kprobes free_insn_slot(kprobe_opcode_t *slot)
>>    +static void collect_garbage_slots(void)
>>     {
>>            struct kprobe_insn_page *kip;
>>            struct hlist_node *pos;
>>    +#if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
>>    +       /* Ensure no-one is preepmted on the garbages */
>>    +       freeze_processes();
> I think you need to check for freeze_processes() return value.
> If the return value indicates failure, then you must NOT
> continue collecting garbage slots, instead call 
> thaw_processes() and return.

OK, I'll check its return value.

>>    +#endif
> 
> [.....]
> 
>>    Index: linux-2.6.18/arch/i386/kernel/kprobes.c
>>    ===================================================================
>>    ---   linux-2.6.18.orig/arch/i386/kernel/kprobes.c          2006-09-25
>>    21:44:22.000000000 +0900
>>    +++       linux-2.6.18/arch/i386/kernel/kprobes.c           2006-09-25
>>    21:44:35.000000000 +0900
>>    @@ -333,7 +333,7 @@
>>                    return 1;
>>     ss_probe:
>>    -#ifndef CONFIG_PREEMPT
>>    +#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PM)
> 
> Now you are supporting booster probes independent of CONFIG_PREEMPT (i.e
> you are supporting booster probes for CONFIG_PREEMPT defined and for NOT defined cases),
> so just defined(CONFIG_PM) should be fine I think.

There is another case; Both of CONFIG_PREEMPT and CONFIG_PM are NOT
defined. In this case, the kprobe-booster should be enabled. But
if we check only CONFIG_PM, we can not enable that in this case.

Thanks,

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

---
 arch/i386/kernel/kprobes.c |    2 -
 kernel/kprobes.c           |   67 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 61 insertions(+), 8 deletions(-)

Index: linux-2.6.18/kernel/kprobes.c
===================================================================
--- linux-2.6.18.orig/kernel/kprobes.c	2006-09-25 21:44:22.000000000 +0900
+++ linux-2.6.18/kernel/kprobes.c	2006-09-26 17:59:04.000000000 +0900
@@ -37,6 +37,7 @@
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/moduleloader.h>
+#include <linux/sched.h>
 #include <asm-generic/sections.h>
 #include <asm/cacheflush.h>
 #include <asm/errno.h>
@@ -72,9 +73,12 @@
 	kprobe_opcode_t *insns;		/* Page of instruction slots */
 	char slot_used[INSNS_PER_PAGE];
 	int nused;
+	int ngarbage;
 };

 static struct hlist_head kprobe_insn_pages;
+static int kprobe_garbage_slots;
+static int collect_garbage_slots(void);

 /**
  * get_insn_slot() - Find a slot on an executable page for an instruction.
@@ -85,6 +89,7 @@
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos;

+      retry:
 	hlist_for_each(pos, &kprobe_insn_pages) {
 		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
 		if (kip->nused < INSNS_PER_PAGE) {
@@ -101,7 +106,11 @@
 		}
 	}

-	/* All out of space.  Need to allocate a new page. Use slot 0.*/
+	/* If there are any garbage slots, collect it and try again. */
+	if (kprobe_garbage_slots && collect_garbage_slots() == 0) {
+		goto retry;
+	}
+	/* All out of space.  Need to allocate a new page. Use slot 0. */
 	kip = kmalloc(sizeof(struct kprobe_insn_page), GFP_KERNEL);
 	if (!kip) {
 		return NULL;
@@ -122,19 +131,31 @@
 	memset(kip->slot_used, 0, INSNS_PER_PAGE);
 	kip->slot_used[0] = 1;
 	kip->nused = 1;
+	kip->ngarbage = 0;
 	return kip->insns;
 }

-void __kprobes free_insn_slot(kprobe_opcode_t *slot)
+static int collect_garbage_slots(void)
 {
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos;
+	int ret = -1;

+#if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
+	/* Ensure no-one is preepmted on the garbages */
+	if (freeze_processes() != 0)
+		goto thaw_all;
+#endif
 	hlist_for_each(pos, &kprobe_insn_pages) {
+		int i;
+	      next:
 		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
-		if (kip->insns <= slot &&
-		    slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
-			int i = (slot - kip->insns) / MAX_INSN_SIZE;
+		if (kip->ngarbage == 0)
+			continue;
+		kip->ngarbage = 0;	/* we will collect all garbages */
+		for (i = 0; i < INSNS_PER_PAGE; i++) {
+			if (kip->slot_used[i] != -1)
+				continue;
 			kip->slot_used[i] = 0;
 			kip->nused--;
 			if (kip->nused == 0) {
@@ -144,19 +165,51 @@
 				 * so as not to have to set it up again the
 				 * next time somebody inserts a probe.
 				 */
+				pos = pos->next;
 				hlist_del(&kip->hlist);
 				if (hlist_empty(&kprobe_insn_pages)) {
 					INIT_HLIST_NODE(&kip->hlist);
 					hlist_add_head(&kip->hlist,
-						&kprobe_insn_pages);
+						       &kprobe_insn_pages);
 				} else {
 					module_free(NULL, kip->insns);
 					kfree(kip);
 				}
+				if (pos == NULL)
+					goto collected;
+				else
+					goto next;
 			}
-			return;
 		}
 	}
+      collected:
+	kprobe_garbage_slots = 0;
+	ret = 0;
+#if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
+      thaw_all:
+	thaw_processes();
+#endif
+	return ret;
+}
+
+void __kprobes free_insn_slot(kprobe_opcode_t * slot)
+{
+	struct kprobe_insn_page *kip;
+	struct hlist_node *pos;
+
+	hlist_for_each(pos, &kprobe_insn_pages) {
+		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
+		if (kip->insns <= slot &&
+		    slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
+			kip->slot_used[(slot - kip->insns) / MAX_INSN_SIZE] =
+			    -1;
+			kip->ngarbage++;
+			break;
+		}
+	}
+	if (++kprobe_garbage_slots > INSNS_PER_PAGE) {
+		collect_garbage_slots();
+	}
 }
 #endif

Index: linux-2.6.18/arch/i386/kernel/kprobes.c
===================================================================
--- linux-2.6.18.orig/arch/i386/kernel/kprobes.c	2006-09-25 21:44:22.000000000 +0900
+++ linux-2.6.18/arch/i386/kernel/kprobes.c	2006-09-26 16:39:15.000000000 +0900
@@ -333,7 +333,7 @@
 		return 1;

 ss_probe:
-#ifndef CONFIG_PREEMPT
+#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PM)
 	if (p->ainsn.boostable == 1 && !p->post_handler){
 		/* Boost up -- we can execute copied instructions directly */
 		reset_current_kprobe();



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-09-26 13:13 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-14 18:46 djprobes status Frank Ch. Eigler
2006-09-15 13:49 ` Satoshi Oshima
2006-09-15 16:50   ` Ingo Molnar
2006-09-15 18:43     ` Satoshi Oshima
2006-09-15 18:58       ` Ingo Molnar
2006-09-15 19:07         ` Ingo Molnar
2006-09-15 19:12           ` Ingo Molnar
2006-09-15 19:12             ` Ingo Molnar
2006-09-15 19:52           ` Satoshi Oshima
2006-09-15 22:35             ` Ingo Molnar
2006-09-15 23:00               ` Satoshi Oshima
2006-09-16  7:11                 ` Ingo Molnar
2006-09-20 10:59                   ` Masami Hiramatsu
2006-09-22 21:35                     ` Satoshi Oshima
2006-09-25 11:49                       ` Ingo Molnar
2006-09-25 13:03                     ` [RFC][kprobes]Preempt enable kprobe-booster(Re: djprobes status) Masami Hiramatsu
2006-09-25 19:00                       ` Keshavamurthy Anil S
2006-09-26 13:13                         ` Masami Hiramatsu
2006-09-16  7:17           ` djprobes status Ingo Molnar
2006-09-15 19:24         ` Satoshi Oshima
2006-09-15 22:32           ` Ingo Molnar
2006-09-15 23:13             ` Satoshi Oshima
2006-09-15 19:10       ` Eugeniy Meshcheryakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).