public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug uprobes/5509] New: uprobe booster thoughts
@ 2007-12-18 17:29 jkenisto at us dot ibm dot com
  2009-03-13 18:28 ` [Bug uprobes/5509] " srikar at linux dot vnet dot ibm dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: jkenisto at us dot ibm dot com @ 2007-12-18 17:29 UTC (permalink / raw)
  To: systemtap

For consideration down the road...  The basic idea is to "boost"
uprobes and uretprobes in the same way that we boost kprobes and
kretprobes (currently for i386 and x86_64 only).

Reviewing Masami's x86_64 k[ret]probe booster patches made me think
about this some more.  Masami and I talked about this briefly at
OLS this year, but we didn't discuss details.

Boosting uprobes was not feasible at that time because the current
slot-allocation scheme employs only "public" (stealable) slots and
therefore requires a return to kernel space after the single-step to
do an up_read() on the instruction slot's rwsem.

On the other hand, the scheme proposed in SystemTap bz5275 employs
mostly private slots, which don't need to be locked and so could
conceivably be boosted.

uretprobe booster
-----------------
I don't think we can boost uretprobes the way we do kretprobes.
The kretprobe booster involves replacing the int3 at the kretprobe
trampoline with code that saves regs, calls the trampoline handler,
restores regs, and returns to the probed function.  But for uretprobes,
we need the int3 to get us into kernel mode.

uprobe booster
--------------
The idea of a uprobe booster is the same as for a kprobe booster:
in the SSOL slot, append a jump instruction after the copy of the
probed instruction.  The jump is to the instruction following the
probed instruction.  This allows us to avoid single-stepping the
instruction copy, which should save nearly 50% of the overhead of
a uprobe hit.

We currently hold the uprobe_process read-locked while processing
the probepoint, and don't unlock it 'til after we've single-stepped
the instruction copy and called uprobe_post_ssout() to run any fixups.
Seems like we could just unlock it before returning control to the
instruction copy+jump in the SSOL slot.

Boosting (adding the jump instruction) is done in uprobe_post_ssout().
Serialize this operation with existing ppt->slot_mutex.  Need memory
barriers here, since private slots are unlocked?

What happens if the probepoint is unregistered while one or more
threads are executing the instructions in the SSOL slot?  We can't free
up the SSOL slot while it's still in use. The uprobe_process->rwsem
no longer explicitly protects us there.  But we can take advantage of
the fact that all threads in the probed process are quiesced when we
remove a probepoint.  We can detect whether a thead is currently
in the SSOL slot by checking the ip.  It could conceivably be stopped
at the instruction-copy or at the jump.  If it's stopped at the
instruction-copy, we adjust the ip to point to the (now restored)
original instruction.  If it's stopped at the jump, we point the
ip at the next instruction (whose address we know from boosting the
probed instruction).

What are the implications for running utask_fake_quiesce() and
uprobe_run_def_regs(), which are currently called (if needed) after
the instruction copy has been single-stepped and the uprobe_process
has been unlocked?

uprobe booster for x86_64
-------------------------
For x86_64, the user address space is very large.  In particular, a
jump instruction with a 32-bit offset from the SSOL area won't reach all
(or even most) of the probed process's text areas.  However, we can
do an indirect jump of the following form
	jmpq *(%rip)
	.quad next_insn
where next_insn is the address of the instruction to which we want to
jump.  (This is an indirect jump to the address stored in the 8 bytes
following the jmpq instruction.)

The above instruction sequence takes 14 bytes: 6 bytes for the jmpq
(always ff 25 00 00 00 00) and 8 bytes for the address.  For x86_64,
MAX_UINSN_BYTES=16, which doesn't leave much room for the actual
instruction copy.  We seem to have the following choices:
a) Boost only 1-byte and 2-byte instructions.  (Ick)
b) Make MAX_UINSN_BYTES larger.
c) Allocate 2 SSOL slots for a boostable instruction.
d) Allocate some big (boostable) slots and some little ones.

I prefer (b).  (c) and (d) complicate the slot allocation algorithm,
which so far is architecture-independent.  Note that there's no
particular reason we can't allocate more than one 4096-byte page to
the SSOL area.

-- 
           Summary: uprobe booster thoughts
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: uprobes
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: jkenisto at us dot ibm dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=5509

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug uprobes/5509] uprobe booster thoughts
  2007-12-18 17:29 [Bug uprobes/5509] New: uprobe booster thoughts jkenisto at us dot ibm dot com
@ 2009-03-13 18:28 ` srikar at linux dot vnet dot ibm dot com
  2009-03-13 18:48 ` srikar at linux dot vnet dot ibm dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: srikar at linux dot vnet dot ibm dot com @ 2009-03-13 18:28 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From srikar at linux dot vnet dot ibm dot com  2009-03-13 16:09 -------
> probed instruction.  This allows us to avoid single-stepping the
> instruction copy, which should save nearly 50% of the overhead of
> a uprobe hit.
> 
> 
> The above instruction sequence takes 14 bytes: 6 bytes for the jmpq
> (always ff 25 00 00 00 00) and 8 bytes for the address.  For x86_64,
> MAX_UINSN_BYTES=16, which doesn't leave much room for the actual
> instruction copy.  We seem to have the following choices:
> a) Boost only 1-byte and 2-byte instructions.  (Ick)
> b) Make MAX_UINSN_BYTES larger.

How larger would make it feasible? Would 24 from the existing 16 bytes be good enuf?

> c) Allocate 2 SSOL slots for a boostable instruction.
> d) Allocate some big (boostable) slots and some little ones.
> 
> I prefer (b).  (c) and (d) complicate the slot allocation algorithm,
> which so far is architecture-independent.  Note that there's no
> particular reason we can't allocate more than one 4096-byte page to
> the SSOL area.

Now that we are looking at instruction analysis layer, it would be possible to
relook at option d. i.e 
A. Big slots for private and boostable instructions with instruction size
greater than 2 bytes.
B. small slots for public or boostable instructions with instruction size less
than 2 bytes.
 
How much additional complexity would this add? Would it justify the performance
gain that we get?

Though it would not solve 9826 completely, the solution for this problem could
act as a workaround for all cases where we can boost the instruction.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5509

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug uprobes/5509] uprobe booster thoughts
  2007-12-18 17:29 [Bug uprobes/5509] New: uprobe booster thoughts jkenisto at us dot ibm dot com
  2009-03-13 18:28 ` [Bug uprobes/5509] " srikar at linux dot vnet dot ibm dot com
@ 2009-03-13 18:48 ` srikar at linux dot vnet dot ibm dot com
  2009-03-13 19:24 ` jkenisto at us dot ibm dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: srikar at linux dot vnet dot ibm dot com @ 2009-03-13 18:48 UTC (permalink / raw)
  To: systemtap



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |srikar at linux dot vnet dot
                   |                            |ibm dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=5509

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug uprobes/5509] uprobe booster thoughts
  2007-12-18 17:29 [Bug uprobes/5509] New: uprobe booster thoughts jkenisto at us dot ibm dot com
  2009-03-13 18:28 ` [Bug uprobes/5509] " srikar at linux dot vnet dot ibm dot com
  2009-03-13 18:48 ` srikar at linux dot vnet dot ibm dot com
@ 2009-03-13 19:24 ` jkenisto at us dot ibm dot com
  2009-03-13 20:31 ` jkenisto at us dot ibm dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jkenisto at us dot ibm dot com @ 2009-03-13 19:24 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jkenisto at us dot ibm dot com  2009-03-13 18:23 -------
(In reply to comment #1)
...
> > The above instruction sequence takes 14 bytes: 6 bytes for the jmpq
> > (always ff 25 00 00 00 00) and 8 bytes for the address.  For x86_64,
> > MAX_UINSN_BYTES=16, which doesn't leave much room for the actual
> > instruction copy.  We seem to have the following choices:
> > a) Boost only 1-byte and 2-byte instructions.  (Ick)
> > b) Make MAX_UINSN_BYTES larger.
> 
> How larger would make it feasible? Would 24 from the existing 16 bytes be good
enuf?

Yes.  Looking at a variety of 64-bit a.outs, it appears that ~99% of
instructions are 10 bytes or less.  A 24-byte slot would leave room for a
10-byte instruction + the 14-byte jump.

> 
> > c) Allocate 2 SSOL slots for a boostable instruction.
> > d) Allocate some big (boostable) slots and some little ones.
> > 
...
> 
> Now that we are looking at instruction analysis layer, it would be possible to
> relook at option d. i.e 
> A. Big slots for private and boostable instructions with instruction size
> greater than 2 bytes.
> B. small slots for public or boostable instructions with instruction size less
> than 2 bytes.

Typically, 15-25% are 1-2 bytes (but that may be high due to stuff like nop
padding).

>  
> How much additional complexity would this add?

Hard to say.  You could wind up with something more complicated than malloc if
you get too cute, but having just 2 slot pools (big/private and small/public)
wouldn't be much more complicated than what I prototyped for #5275.

> Would it justify the performance
> gain that we get?

Having multiple slot sizes would save memory (i.e., the size of the SSOL vma),
but I don't think it would otherwise help performance.  As previously mentioned,
the performance gain from boosting should be close to 50%.

> 
> Though it would not solve 9826 completely, the solution for this problem could
> act as a workaround for all cases where we can boost the instruction.

Well, it would reduce our exposure by reducing single-stepping.  But I think
that fixing 9826 should be easier than implementing boosting.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5509

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug uprobes/5509] uprobe booster thoughts
  2007-12-18 17:29 [Bug uprobes/5509] New: uprobe booster thoughts jkenisto at us dot ibm dot com
                   ` (2 preceding siblings ...)
  2009-03-13 19:24 ` jkenisto at us dot ibm dot com
@ 2009-03-13 20:31 ` jkenisto at us dot ibm dot com
  2009-04-21 23:40 ` jkenisto at us dot ibm dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jkenisto at us dot ibm dot com @ 2009-03-13 20:31 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jkenisto at us dot ibm dot com  2009-03-13 18:27 -------
Created an attachment (id=3821)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=3821&action=view)
awk script: histogram of x86_64 instruction lengths

Here's the awk script I used to get the stats for comment #2.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5509

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug uprobes/5509] uprobe booster thoughts
  2007-12-18 17:29 [Bug uprobes/5509] New: uprobe booster thoughts jkenisto at us dot ibm dot com
                   ` (3 preceding siblings ...)
  2009-03-13 20:31 ` jkenisto at us dot ibm dot com
@ 2009-04-21 23:40 ` jkenisto at us dot ibm dot com
  2009-04-22  0:12 ` jkenisto at us dot ibm dot com
  2009-06-30 18:35 ` jkenisto at us dot ibm dot com
  6 siblings, 0 replies; 8+ messages in thread
From: jkenisto at us dot ibm dot com @ 2009-04-21 23:40 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jkenisto at us dot ibm dot com  2009-04-21 23:40 -------
Here's my list of the x86 instruction opcodes that are acceptable to
uprobes* but are not boostable.  In some cases -- e.g, loop, prefetch
-- I deferred to x86 kprobes's opinion as to what's boostable.

1-byte opcodes:
70-7f: (conditional) relative jumps
9a: call
e0-e3: loop
e8: call
e9: relative jump
eb: relative jump
ff: Group 5 - generally OK, but includes two call instructions (reg = 2, 3)

2-byte opcodes (first byte = 0f):
18: prefetch
80-8f: (conditional) relative jumps

Also, for floating-point instructions, we need to do some more testing
to gain more confidence in our instruction-length computations.

Relative jumps aren't boostable because they can leave you in
the neighborhood of the SSOL slot, but not in it.  Calls aren't
boostable because you must fix up the return address -- and some are
also relative.

* "Unacceptable" opcodes include invalid opcodes, privileged instructions,
instruction prefixes, in/out instructions, interrupt instructions, and
a few others that are dubious for one reason or another.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5509

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug uprobes/5509] uprobe booster thoughts
  2007-12-18 17:29 [Bug uprobes/5509] New: uprobe booster thoughts jkenisto at us dot ibm dot com
                   ` (4 preceding siblings ...)
  2009-04-21 23:40 ` jkenisto at us dot ibm dot com
@ 2009-04-22  0:12 ` jkenisto at us dot ibm dot com
  2009-06-30 18:35 ` jkenisto at us dot ibm dot com
  6 siblings, 0 replies; 8+ messages in thread
From: jkenisto at us dot ibm dot com @ 2009-04-22  0:12 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jkenisto at us dot ibm dot com  2009-04-22 00:12 -------
Add one more opcode to the unboostable list: 0f 0d (another prefetch instruction).

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5509

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug uprobes/5509] uprobe booster thoughts
  2007-12-18 17:29 [Bug uprobes/5509] New: uprobe booster thoughts jkenisto at us dot ibm dot com
                   ` (5 preceding siblings ...)
  2009-04-22  0:12 ` jkenisto at us dot ibm dot com
@ 2009-06-30 18:35 ` jkenisto at us dot ibm dot com
  6 siblings, 0 replies; 8+ messages in thread
From: jkenisto at us dot ibm dot com @ 2009-06-30 18:35 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jkenisto at us dot ibm dot com  2009-06-30 18:34 -------
In case it's not obvious, rip-relative x86_64 instructions can't be boosted
because (at least as currently handled) they require a fixup: restoring the
value of the scratch register.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5509

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-06-30 18:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-18 17:29 [Bug uprobes/5509] New: uprobe booster thoughts jkenisto at us dot ibm dot com
2009-03-13 18:28 ` [Bug uprobes/5509] " srikar at linux dot vnet dot ibm dot com
2009-03-13 18:48 ` srikar at linux dot vnet dot ibm dot com
2009-03-13 19:24 ` jkenisto at us dot ibm dot com
2009-03-13 20:31 ` jkenisto at us dot ibm dot com
2009-04-21 23:40 ` jkenisto at us dot ibm dot com
2009-04-22  0:12 ` jkenisto at us dot ibm dot com
2009-06-30 18:35 ` jkenisto at us dot ibm dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).