Re: [RFC][PATCH 2/2] uprobes: single-step out of line

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* Re: [RFC][PATCH 2/2] uprobes: single-step out of line
@ 2007-05-05  1:07 Ernie Petrides
  2007-05-07 22:02 ` Jim Keniston
  0 siblings, 1 reply; 6+ messages in thread
From: Ernie Petrides @ 2007-05-05  1:07 UTC (permalink / raw)
  To: Jim Keniston; +Cc: Frank Ch. Eigler, Linda Wang, Ernie Petrides, systemtap

On Friday, 20-Apr-2007 at 15:09 PDT, Jim Keniston wrote:

> This patch enhances uprobes to use "single-stepping out of line"
> (SSOL) to avoid probe misses in multithreaded applications.  SSOL also
> reduces probe overhead by 25-30%.

Hi, Jim.  I just had one question about this (mainly because I'm not
able to work through all the nitty gritty details of the patch).

Is there a way to avoid adding the "uprobe_ssol_area" struct into the
"mm_struct"?  If so, the uprobes module could be easily back-ported to
kABI-frozen distros of the Linux kernel.  If the "mm_struct" ends up
getting changed, that changes the "task_struct" layout, thus breaking
binary compatibility with 3rd-party kernel modules.  (We're not allowed
to do this in RHEL distros.)

Cheers.  -ernie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 2/2] uprobes: single-step out of line
  2007-05-05  1:07 [RFC][PATCH 2/2] uprobes: single-step out of line Ernie Petrides
@ 2007-05-07 22:02 ` Jim Keniston
  2007-05-09  1:32   ` Ernie Petrides
  0 siblings, 1 reply; 6+ messages in thread
From: Jim Keniston @ 2007-05-07 22:02 UTC (permalink / raw)
  To: Ernie Petrides; +Cc: Frank Ch. Eigler, Linda Wang, systemtap

On Fri, 2007-05-04 at 21:09 -0400, Ernie Petrides wrote:
> On Friday, 20-Apr-2007 at 15:09 PDT, Jim Keniston wrote:
> 
> > This patch enhances uprobes to use "single-stepping out of line"
> > (SSOL) to avoid probe misses in multithreaded applications.  SSOL
also
> > reduces probe overhead by 25-30%.
> 
> 
> Hi, Jim.  I just had one question about this (mainly because I'm not
> able to work through all the nitty gritty details of the patch).
> 
> Is there a way to avoid adding the "uprobe_ssol_area" struct into the
> "mm_struct"?  If so, the uprobes module could be easily back-ported to
> kABI-frozen distros of the Linux kernel.  If the "mm_struct" ends up
> getting changed, that changes the "task_struct" layout, thus breaking
> binary compatibility with 3rd-party kernel modules.  (We're not
allowed
> to do this in RHEL distros.)
> 
> Cheers.  -ernie

Hmmm.  It does indeed change the layout of struct mm_struct.  I don't
see how it changes the layout of task_struct, since task_struct
contains only pointers to mm_structs.  But changing mm_struct itself
is bad, right?

An obvious alternative is for uprobes to maintain this pointer
in one of its own data structures.  Currently, when the last uprobe
for a process is unregistered, we discard the uprobe_process and
uprobe_tasks, and the only thing that remains is the pointer to
the uprobe_ssol_area (in mm_context).  We need to remember that
pointer in case the process is probed again -- we want to reuse the
vma.  Two sub-alternatives occur to me:

1a. Once a process is probed, keep the uprobe_process (and
uprobe_tasks and utrace engines) around 'til the process dies.
(We need the utrace engines because they're what tell us when the
threads die.)

1b. Store just the uprobe_ssol_area pointer in a hash table.
Every time a process exits, check that hash table and free up
the little pointer object if there is one for that process.
I'm not sure where I'd put that check -- maybe __mmdrop()?
I don't know my way around do_exit() very well.  And I wonder
if people will object to adding more code to the exit path.

Another alternative is to free up the vma every time we free
the uprobe_process.  I don't like that idea much.

One thing to keep in mind is that with uprobes patch #4 (was patch
#3 in Q1 2007 -- hasn't been reposted recently), uprobe handlers
can [un]register probes, so the number of registered probes for
a process can bounce between zero and non-zero rather frequently.
This recommends approach #1a.

Jim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 2/2] uprobes: single-step out of line
  2007-05-07 22:02 ` Jim Keniston
@ 2007-05-09  1:32   ` Ernie Petrides
  2007-05-10 23:17     ` Jim Keniston
  0 siblings, 1 reply; 6+ messages in thread
From: Ernie Petrides @ 2007-05-09  1:32 UTC (permalink / raw)
  To: Jim Keniston; +Cc: Ernie Petrides, Frank Ch. Eigler, Linda Wang, systemtap

On Monday, 7-May-2007 at 14:2 PDT, Jim Keniston wrote:

> On Fri, 2007-05-04 at 21:09 -0400, Ernie Petrides wrote:
>
> > Is there a way to avoid adding the "uprobe_ssol_area" struct into the
> > "mm_struct"?  If so, the uprobes module could be easily back-ported to
> > kABI-frozen distros of the Linux kernel.  If the "mm_struct" ends up
> > getting changed, that changes the "task_struct" layout, thus breaking
> > binary compatibility with 3rd-party kernel modules.  (We're not allowed
> > to do this in RHEL distros.)
>
>
> Hmmm.  It does indeed change the layout of struct mm_struct.  I don't
> see how it changes the layout of task_struct, since task_struct
> contains only pointers to mm_structs.

Ah, my mistake.  You are correct.  But because of how exported symbol
checksums are generated (recursively traversing all depended-on info),
all functions taking (task_struct *) arguments would become incompatible.

> But changing mm_struct itself is bad, right?

Besides the exported symbol versioning issue I've already explained, it
might also be the case that somewhere there is a global (or auto-class)
mm_struct.  (There are a few in the base kernel, but one might argue
that there shouldn't be any in 3rd-party modules.)  If there were one,
and somehow the "runt" mm_struct were referenced by a kernel built with
the uprobes infrastructure changes (expanding the mm_struct), then you
get fetch a bogus "uprobes_ssol_area" pointer off the end of an old struct.

I'm not sure how plausible this is, but it's something to consider.

> An obvious alternative is for uprobes to maintain this pointer
> in one of its own data structures.  Currently, when the last uprobe
> for a process is unregistered, we discard the uprobe_process and
> uprobe_tasks, and the only thing that remains is the pointer to
> the uprobe_ssol_area (in mm_context).  We need to remember that
> pointer in case the process is probed again -- we want to reuse the
> vma.  [...]

Originally, I missed the point about reusing the VMA again later
(following the unregistering of the last probe).  So, I guess you
do have a reasonable need for MM-persistent data.

I'm not sure what the best solution is.  Maybe what you've already
got here is reasonable.  I'd need to study mm_struct compatibility
issues for a while to determine if this would be a deal-breaker in
terms of the kABI issue.  (We have this "#ifndef __GENKSYMS__" hack
that can sometimes be used to accommodate these sorts of structure
additions in a RHEL update to avoid the symbol checksum change, but
it's only viable if there's no true underlying compatibility problem.)

Cheers.  -ernie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 2/2] uprobes: single-step out of line
  2007-05-09  1:32   ` Ernie Petrides
@ 2007-05-10 23:17     ` Jim Keniston
  2007-05-11 22:31       ` Ernie Petrides
  0 siblings, 1 reply; 6+ messages in thread
From: Jim Keniston @ 2007-05-10 23:17 UTC (permalink / raw)
  To: Ernie Petrides; +Cc: Frank Ch. Eigler, Linda Wang, systemtap

On Tue, 2007-05-08 at 21:34 -0400, Ernie Petrides wrote:
> On Monday, 7-May-2007 at 14:2 PDT, Jim Keniston wrote:
> 
> > On Fri, 2007-05-04 at 21:09 -0400, Ernie Petrides wrote:
> >
> > > Is there a way to avoid adding the "uprobe_ssol_area" struct into the
> > > "mm_struct"?  If so, the uprobes module could be easily back-ported to
> > > kABI-frozen distros of the Linux kernel.  If the "mm_struct" ends up
> > > getting changed, that changes the "task_struct" layout, thus breaking
> > > binary compatibility with 3rd-party kernel modules.  (We're not allowed
> > > to do this in RHEL distros.)
> >
> >
> > Hmmm.  It does indeed change the layout of struct mm_struct.  I don't
> > see how it changes the layout of task_struct, since task_struct
> > contains only pointers to mm_structs.
> 
> Ah, my mistake.  You are correct.  But because of how exported symbol
> checksums are generated (recursively traversing all depended-on info),
> all functions taking (task_struct *) arguments would become incompatible.
> 
> 
> 
> > But changing mm_struct itself is bad, right?
> 
> Besides the exported symbol versioning issue I've already explained, it
> might also be the case that somewhere there is a global (or auto-class)
> mm_struct.  (There are a few in the base kernel, but one might argue
> that there shouldn't be any in 3rd-party modules.)  If there were one,
> and somehow the "runt" mm_struct were referenced by a kernel built with
> the uprobes infrastructure changes (expanding the mm_struct), then you
> get fetch a bogus "uprobes_ssol_area" pointer off the end of an old struct.
> 
> I'm not sure how plausible this is, but it's something to consider.
> 
> 
> 
> > An obvious alternative is for uprobes to maintain this pointer
> > in one of its own data structures.  Currently, when the last uprobe
> > for a process is unregistered, we discard the uprobe_process and
> > uprobe_tasks, and the only thing that remains is the pointer to
> > the uprobe_ssol_area (in mm_context).  We need to remember that
> > pointer in case the process is probed again -- we want to reuse the
> > vma.  [...]
> 
> Originally, I missed the point about reusing the VMA again later
> (following the unregistering of the last probe).  So, I guess you
> do have a reasonable need for MM-persistent data.
> 
> I'm not sure what the best solution is.  Maybe what you've already
> got here is reasonable.  I'd need to study mm_struct compatibility
> issues for a while to determine if this would be a deal-breaker in
> terms of the kABI issue.  (We have this "#ifndef __GENKSYMS__" hack
> that can sometimes be used to accommodate these sorts of structure
> additions in a RHEL update to avoid the symbol checksum change, but
> it's only viable if there's no true underlying compatibility problem.)


Yes, I'd appreciate it if you confirm the need for a change here, since
the effort/implications for this change are non-trivial.

> 
> 
> Cheers.  -ernie

Thanks.
Jim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 2/2] uprobes: single-step out of line
  2007-05-10 23:17     ` Jim Keniston
@ 2007-05-11 22:31       ` Ernie Petrides
  0 siblings, 0 replies; 6+ messages in thread
From: Ernie Petrides @ 2007-05-11 22:31 UTC (permalink / raw)
  To: Jim Keniston; +Cc: Frank Ch. Eigler, Linda Wang, systemtap

On Thursday, 10-May-2007 at 15:17 PDT, Jim Keniston wrote:

> On Tue, 2007-05-08 at 21:34 -0400, Ernie Petrides wrote:
>
> > I'm not sure what the best solution is.  Maybe what you've already
> > got here is reasonable.  I'd need to study mm_struct compatibility
> > issues for a while to determine if this would be a deal-breaker in
> > terms of the kABI issue.  (We have this "#ifndef __GENKSYMS__" hack
> > that can sometimes be used to accommodate these sorts of structure
> > additions in a RHEL update to avoid the symbol checksum change, but
> > it's only viable if there's no true underlying compatibility problem.)
>
>
> Yes, I'd appreciate it if you confirm the need for a change here, since
> the effort/implications for this change are non-trivial.

I've researched "mm_struct" usage in the latest RHEL5 kernel sources,
specifically looking for dependencies on the structure size and on the
offsets to the fields beyond the "mm_context_t".

As far as I can see, all of these dependencies are in the base part of
the kernel (as opposed to modules).  This suggest that the __GENKSYMS__
hack could be used to hide your new "uprobes_ssol_area" field being
added to the "mm_context_t" (to preserve exported symbol versioning)
without causing a true binary compatibility problem.

That being said, I don't represent views of the RHEL5 kernel maintainer
nor any other senior developers who might have to sign off on such a
change in the hypothetical scenario of a uprobes back-port to RHEL5.
But using __GENKSYMS__ for this situation looks safe to me.

Obviously, this is a non-issue for upstream acceptance, since all
sources are expected to be recompiled (and thus there is no attempt
to preserve kABI).  I did notice, however, that the "dumpable" field
of the "mm_struct" comes after the "mm_context_t" upstream in 2.6.21
(unlike in RHEL5).  Some other distro based on a more recent upstream
version could conceivably have an issue with this field, since it's
remotely possible (though unlikely) that a 3rd-party exec format
handler or security module might access "dumpable" (whose field
offset would change with an "mm_context_t" addition).  But this is
a somewhat far-fetched example, so all in all, you're probably okay.

Cheers.  -ernie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC][PATCH 2/2] uprobes: single-step out of line
@ 2007-04-20 23:10 Jim Keniston
  0 siblings, 0 replies; 6+ messages in thread
From: Jim Keniston @ 2007-04-20 23:10 UTC (permalink / raw)
  To: systemtap

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

This patch enhances uprobes to use "single-stepping out of line"
(SSOL) to avoid probe misses in multithreaded applications.  SSOL also
reduces probe overhead by 25-30%.

This patch creates a 1-page VM area in the probed process for SSOL.
This implementation creates the SSOL area only for processes that
are actually probed.  An older implementation, which preallocates
a SSOL area at exec time, can be resurrected if there's interest.
In the current implementation, the size of the SSOL area is fixed
at 1 page.  Slots in the SSOL area are recycled, as necessary, on a
least-recently-used basis.

Comments welcome.

Jim Keniston

[-- Attachment #2: 2-uprobes-ssol.patch --]
[-- Type: text/x-patch, Size: 25203 bytes --]


Uprobes is enhanced to use "single-stepping out of line" (SSOL)
to avoid probe misses in multithreaded applications.  SSOL also
reduces probe overhead by 25-30%.

After a breakpoint has been hit and uprobes has run the probepoint's
handler(s), uprobes must execute the probed instruction in the
context of the probed process.  There are two commonly accepted
ways to do this:

o Single-stepping inline (SSIL): Temporarily replace the breakpoint
instruction with the original instruction; single-step the instruction;
restore the breakpoint instruction; and allow the thread to continue.
This method is typically used by interactive debuggers such as gdb,
and is also used in the uprobes base patch.  This approach doesn't
work acceptably for multithreaded programs, because while the
breakpoint is temporarily removed, other threads can sail past the
probepoint.  It also requires two writes to the probed process's
text for every probe hit.

o Single-stepping out of line (SSOL): Place a copy of the original
instruction somewhere in the probed process's address space;
single-step the copy; fix up the thread state as necessary; and allow
the thread to continue.  This approach is used by kprobes.

This implementation of SSOL entails two major components:

1) Allocation and management of an "SSOL area."  Before handling
the first probe hit, uprobes allocates a VM area in the probed
process's address space, and divides it into "instruction slots."
The first time a probepoint is hit, an instruction slot is allocated
to it and a copy of the probed instruction is placed there.  Multiple
threads can march through that probepoint simultaneously, all using
the same slot.  Currently, we allocate a VM area only for probed
processes (rather than at exec time for every process), its size
is one page, and it never grows.  Slots are recycled, as necessary,
on a least-recently-used basis.

2) Architecture-specific fix-ups for certain instructions.  If the
effect of an instruction depends on its address, the thread's
registers and/or stack must be fixed up after the instruction-copy
is single-stepped.  For i386 uprobes, the fixups were stolen from
i386 kprobes.

---

 Documentation/uprobes.txt  |   15 +-
 arch/i386/kernel/Makefile  |    1 
 arch/i386/kernel/uprobes.c |  139 ++++++++++++++++++++
 include/asm-i386/mmu.h     |    1 
 include/asm-i386/uprobes.h |    2 
 include/linux/uprobes.h    |   99 ++++++++++++++
 kernel/uprobes.c           |  303 ++++++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 549 insertions(+), 11 deletions(-)

diff -puN Documentation/uprobes.txt~2-uprobes-ssol Documentation/uprobes.txt
--- linux-2.6.21-rc6/Documentation/uprobes.txt~2-uprobes-ssol	2007-04-20 11:24:32.000000000 -0700
+++ linux-2.6.21-rc6-jimk/Documentation/uprobes.txt	2007-04-20 11:25:27.000000000 -0700
@@ -54,14 +54,13 @@ handler the addresses of the uprobe stru
 The handler may block, but keep in mind that the probed thread remains
 stopped while your handler runs.
 
-Next, Uprobes single-steps the probed instruction and resumes execution
-of the probed process at the instruction following the probepoint.
-[Note: In the base uprobes patch, we temporarily remove the breakpoint
-instruction, insert the original opcode, single-step the instruction
-"inline", and then replace the breakpoint.  This can create problems
-in a multithreaded application.  For example, it opens a time window
-during which another thread can sail right past the probepoint.
-This problem is resolved in the "single-stepping out of line" patch.]
+Next, Uprobes single-steps its copy of the probed instruction and
+resumes execution of the probed process at the instruction following
+the probepoint.  (It would be simpler to single-step the actual
+instruction in place, but then Uprobes would have to temporarily
+remove the breakpoint instruction.  This would create problems in a
+multithreaded application.  For example, it would open a time window
+when another thread could sail right past the probepoint.)
 
 1.2 The Role of Utrace
 
diff -puN arch/i386/kernel/Makefile~2-uprobes-ssol arch/i386/kernel/Makefile
--- linux-2.6.21-rc6/arch/i386/kernel/Makefile~2-uprobes-ssol	2007-04-20 11:24:33.000000000 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/kernel/Makefile	2007-04-20 11:25:27.000000000 -0700
@@ -41,6 +41,7 @@ obj-$(CONFIG_EARLY_PRINTK)	+= early_prin
 obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
 obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_STACK_UNWIND)	+= unwind.o
+obj-$(CONFIG_UPROBES)		+= uprobes.o
 
 obj-$(CONFIG_VMI)		+= vmi.o vmitime.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt.o
diff -puN /dev/null arch/i386/kernel/uprobes.c
--- /dev/null	2007-04-20 13:55:48.505165167 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/kernel/uprobes.c	2007-04-20 11:25:27.000000000 -0700
@@ -0,0 +1,139 @@
+/*
+ *  Userspace Probes (UProbes)
+ *  arch/i386/kernel/uprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/uprobes.h>
+#include <linux/mm.h>
+#include <linux/dcache.h>
+#include <linux/namei.h>
+#include <linux/pagemap.h>
+#include <asm/kdebug.h>
+#include <asm/uprobes.h>
+
+/*
+ * TODO: Better handling of probed process's memory getting messed up.
+ * E.g., use printk() instead of BUG().
+ */
+
+/*
+ * Get an instruction slot from the process's SSOL area, containing the
+ * instruction at uk's probepoint.  Point the eip at that slot, in
+ * preparation for single-stepping out of line.
+ */
+int uprobe_prepare_singlestep(struct uprobe_kimg *uk,
+		struct uprobe_task *utask, struct pt_regs *regs)
+{
+	struct uprobe_ssol_slot *slot;
+
+	slot = uprobe_get_insn_slot(uk);
+	BUG_ON(!slot);
+	regs->eip = (long)slot->insn;
+	utask->singlestep_addr = regs->eip;
+
+	return 0;
+}
+
+/*
+ * Called by uprobe_resume_execution to adjust the return address
+ * pushed by a call instruction executed out-of-line.
+ */
+static void adjust_ret_addr(long esp, long correction)
+{
+	int nleft;
+	long ra;
+
+	nleft = copy_from_user(&ra, (const void __user *) esp, 4);
+	if (unlikely(nleft != 0))
+		goto fail;
+	ra +=  correction;
+	nleft = copy_to_user((void __user *) esp, &ra, 4);
+	if (unlikely(nleft != 0))
+		goto fail;
+	return;
+
+fail:
+	printk(KERN_ERR
+		"uprobes: Failed to adjust return address after"
+		" single-stepping call instruction;"
+		" pid=%d, esp=%#lx\n", current->pid, esp);
+	BUG();
+}
+
+/*
+ * Called after single-stepping.  uk->vaddr is the address of the
+ * instruction whose first byte has been replaced by the "int 3"
+ * instruction.  To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction.  The address of this
+ * copy is utask->singlestep_addr.
+ *
+ * This function prepares to return from the post-single-step
+ * interrupt.  We have to fix up the stack as follows:
+ *
+ * 0) Typically, the new eip is relative to the copied instruction.  We
+ * need to make it relative to the original instruction.  Exceptions are
+ * return instructions and absolute or indirect jump or call instructions.
+ *
+ * 1) If the single-stepped instruction was a call, the return address
+ * that is atop the stack is the address following the copied instruction.
+ * We need to make it the address following the original instruction.
+ */
+void uprobe_resume_execution(struct uprobe_kimg *uk,
+				struct uprobe_task *utask, struct pt_regs *regs)
+{
+	long next_eip = 0;
+	long copy_eip = utask->singlestep_addr;
+	long orig_eip = uk->vaddr;
+
+	switch (uk->insn[0]) {
+	case 0xc3:		/* ret/lret */
+	case 0xcb:
+	case 0xc2:
+	case 0xca:
+		next_eip = regs->eip;
+		/* eip is already adjusted, no more changes required*/
+		break;
+	case 0xe8:		/* call relative - Fix return addr */
+		adjust_ret_addr(regs->esp, (orig_eip - copy_eip));
+		break;
+	case 0xff:
+		if ((uk->insn[1] & 0x30) == 0x10) {
+			/* call absolute, indirect */
+			/* Fix return addr; eip is correct. */
+			next_eip = regs->eip;
+			adjust_ret_addr(regs->esp, (orig_eip - copy_eip));
+		} else if (((uk->insn[1] & 0x31) == 0x20) ||
+			   ((uk->insn[1] & 0x31) == 0x21)) {
+			/* jmp near or jmp far  absolute indirect */
+			/* eip is correct. */
+			next_eip = regs->eip;
+		}
+		break;
+	case 0xea:		/* jmp absolute -- eip is correct */
+		next_eip = regs->eip;
+		break;
+	default:
+		break;
+	}
+
+	if (next_eip)
+		regs->eip = next_eip;
+	else
+		regs->eip = orig_eip + (regs->eip - copy_eip);
+}
diff -puN include/asm-i386/mmu.h~2-uprobes-ssol include/asm-i386/mmu.h
--- linux-2.6.21-rc6/include/asm-i386/mmu.h~2-uprobes-ssol	2007-04-20 11:24:33.000000000 -0700
+++ linux-2.6.21-rc6-jimk/include/asm-i386/mmu.h	2007-04-20 11:25:27.000000000 -0700
@@ -13,6 +13,7 @@ typedef struct { 
 	struct semaphore sem;
 	void *ldt;
 	void *vdso;
+	void *uprobes_ssol_area;
 } mm_context_t;
 
 #endif
diff -puN include/asm-i386/uprobes.h~2-uprobes-ssol include/asm-i386/uprobes.h
--- linux-2.6.21-rc6/include/asm-i386/uprobes.h~2-uprobes-ssol	2007-04-20 11:24:33.000000000 -0700
+++ linux-2.6.21-rc6-jimk/include/asm-i386/uprobes.h	2007-04-20 11:25:27.000000000 -0700
@@ -23,6 +23,8 @@
 #include <linux/types.h>
 #include <linux/ptrace.h>
 
+#define SS_OUT_OF_LINE
+
 typedef u8 uprobe_opcode_t;
 #define BREAKPOINT_INSTRUCTION	0xcc
 #define BP_INSN_SIZE 1
diff -puN include/linux/uprobes.h~2-uprobes-ssol include/linux/uprobes.h
--- linux-2.6.21-rc6/include/linux/uprobes.h~2-uprobes-ssol	2007-04-20 11:24:33.000000000 -0700
+++ linux-2.6.21-rc6-jimk/include/linux/uprobes.h	2007-04-20 11:25:27.000000000 -0700
@@ -32,6 +32,8 @@ struct task_struct;
 struct utrace_attached_engine;
 struct uprobe_kimg;
 struct uprobe;
+struct uprobe_ssol_slot;
+struct uprobe_ssol_area;
 
 /*
  * This is what the user supplies us.
@@ -90,6 +92,70 @@ enum uprobe_task_state {
 
 #define UPROBE_HASH_BITS 5
 #define UPROBE_TABLE_SIZE (1 << UPROBE_HASH_BITS)
+#define UINSNS_PER_PAGE (PAGE_SIZE/MAX_UINSN_BYTES)
+
+/*
+ * The below slot states are just advisory, used when deciding which slot
+ * to steal.
+ */
+enum uprobe_ssol_slot_state {
+	SSOL_FREE,
+	SSOL_ASSIGNED,
+	SSOL_BEING_STOLEN
+};
+
+/*
+ * For a uprobe_process that uses an SSOL area, there's an array of these
+ * objects matching the array of instruction slots in the SSOL area.
+ */
+struct uprobe_ssol_slot {
+	/* The slot in the SSOL area that holds the instruction-copy */
+	__user uprobe_opcode_t	*insn;
+
+	enum uprobe_ssol_slot_state state;
+
+	/* The probepoint that currently owns this slot */
+	struct uprobe_kimg *owner;
+
+	/*
+	 * Read-locked when slot is in use during single-stepping.
+	 * Write-locked by stealing task.
+	 */
+	struct rw_semaphore rwsem;
+
+	/* Used for LRU heuristics.  If this overflows, it's OK. */
+	unsigned long last_used;
+};
+
+enum uprobe_ssol_area_state {
+	SSOL_NOT_SETUP,
+	SSOL_SETUP_OK,
+	SSOL_SETUP_FAILED
+};
+
+/*
+ * The per-process single-stepping out-of-line (SSOL) area
+ */
+struct uprobe_ssol_area {
+	/* Array of instruction slots in the vma we allocate */
+	__user uprobe_opcode_t *insn_area;
+
+	int nslots;
+
+	/* Array of slot objects, one per instruction slot */
+	struct uprobe_ssol_slot *slots;
+
+	/* lock held while finding a free slot */
+	spinlock_t lock;
+
+	/* Next slot to steal */
+	int next_slot;
+
+	enum uprobe_ssol_area_state state;
+
+	/* Ensures 2 threads don't try to set up the vma simultaneously. */
+	struct mutex setup_mutex;
+};
 
 /*
  * uprobe_process -- not a user-visible struct.
@@ -142,6 +208,18 @@ struct uprobe_process {
 	 * since once the last thread has exited, the rest is academic.
 	 */
 	struct kref refcount;
+
+	/*
+	 * Manages slots for instruction-copies to be single-stepped
+	 * out of line.
+	 */
+	struct uprobe_ssol_area ssol_area;
+
+	/*
+	 * 1 to single-step out of line; 0 for inline.  This can drop to
+	 * 0 if we can't set up the SSOL area, but never goes from 0 to 1.
+	 */
+	int sstep_out_of_line;
 };
 
 /*
@@ -187,6 +265,19 @@ struct uprobe_kimg {
 
 	/* [un]register_uprobe() waits 'til bkpt inserted/removed. */
 	wait_queue_head_t waitq;
+
+	/*
+	 * We put the instruction-copy here to single-step it.
+	 * We don't own it unless slot->owner points back to us.
+	 */
+	struct uprobe_ssol_slot *slot;
+
+	/*
+	 * Hold this while stealing an insn slot to ensure that no
+	 * other thread, having also hit this probepoint, simultaneously
+	 * steals a slot for it.
+	 */
+	struct mutex slot_mutex;
 };
 
 /*
@@ -229,6 +320,14 @@ struct uprobe_task {
 int register_uprobe(struct uprobe *u);
 void unregister_uprobe(struct uprobe *u);
 
+#ifdef SS_OUT_OF_LINE
+extern struct uprobe_ssol_slot *uprobe_get_insn_slot(struct uprobe_kimg *uk);
+extern int uprobe_prepare_singlestep(struct uprobe_kimg *uk,
+			struct uprobe_task *utask, struct pt_regs *regs);
+extern void uprobe_resume_execution(struct uprobe_kimg *uk,
+			struct uprobe_task *utask, struct pt_regs *regs);
+#endif
+
 #else	/* CONFIG_UPROBES */
 
 static inline int register_uprobe(struct uprobe *u)
diff -puN kernel/uprobes.c~2-uprobes-ssol kernel/uprobes.c
--- linux-2.6.21-rc6/kernel/uprobes.c~2-uprobes-ssol	2007-04-20 11:24:33.000000000 -0700
+++ linux-2.6.21-rc6-jimk/kernel/uprobes.c	2007-04-20 11:25:27.000000000 -0700
@@ -29,8 +29,10 @@
 #include <linux/utrace.h>
 #include <linux/uprobes.h>
 #include <linux/tracehook.h>
+#include <linux/mm.h>
 #include <asm/tracehook.h>
 #include <asm/errno.h>
+#include <asm/mman.h>
 
 #define SET_ENGINE_FLAGS	1
 #define CLEAR_ENGINE_FLAGS	0
@@ -359,6 +361,7 @@ static int quiesce_all_threads(struct up
 static void uprobe_free_process(struct uprobe_process *uproc)
 {
 	struct uprobe_task *utask, *tmp;
+	struct uprobe_ssol_area *area = &uproc->ssol_area;
 
 	if (!hlist_unhashed(&uproc->hlist))
 		hlist_del(&uproc->hlist);
@@ -375,6 +378,8 @@ static void uprobe_free_process(struct u
 		mutex_unlock(&utask->mutex);
 		kfree(utask);
 	}
+	if (area->slots)
+		kfree(area->slots);
 	mutex_unlock(&uproc->mutex);	// So kfree doesn't complain
 	kfree(uproc);
 }
@@ -503,6 +508,14 @@ static struct uprobe_process *uprobe_mk_
 	INIT_HLIST_NODE(&uproc->hlist);
 	uproc->tgid = p->tgid;
 
+	uproc->ssol_area.state = SSOL_NOT_SETUP;
+	mutex_init(&uproc->ssol_area.setup_mutex);
+#ifdef SS_OUT_OF_LINE
+	uproc->sstep_out_of_line = 1;
+#else
+	uproc->sstep_out_of_line = 0;
+#endif
+
 	/*
 	 * Create and populate one utask per thread in this process.  We
 	 * can't call uprobe_add_task() while holding tasklist_lock, so we:
@@ -554,6 +567,8 @@ static struct uprobe_kimg *uprobe_add_ki
 	init_rwsem(&uk->rwsem);
 	down_write(&uk->rwsem);
 	init_waitqueue_head(&uk->waitq);
+	mutex_init(&uk->slot_mutex);
+	uk->slot = NULL;
 
 	/* Connect to u. */
 	INIT_LIST_HEAD(&uk->uprobe_list);
@@ -582,8 +597,18 @@ static struct uprobe_kimg *uprobe_add_ki
  */
 static void uprobe_free_kimg_locked(struct uprobe_kimg *uk)
 {
+	struct uprobe_ssol_slot *slot = uk->slot;
+
 	hlist_del(&uk->ut_node);
 	uk->uproc->nuk--;
+	if (slot) {
+		down_write(&slot->rwsem);
+		if (slot->owner == uk) {
+			slot->state = SSOL_FREE;
+			slot->owner = NULL;
+		}
+		up_write(&slot->rwsem);
+	}
 	up_write(&uk->rwsem);
 	kfree(uk);
 }
@@ -817,6 +842,233 @@ void unregister_uprobe(struct uprobe *u)
 }
 
 /*
+ * Functions for allocation of the SSOL area, and the instruction slots
+ * therein
+ */
+
+/*
+ * MMap a page for the uprobes SSOL area.  This approach was suggested by
+ * Roland McGrath.
+ */
+static int uprobe_setup_ssol_vma(void)
+{
+	unsigned long addr;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+	down_write(&mm->mmap_sem);
+	/*
+	 * Find the end of the top mapping and skip a page.
+	 * If there is no space for PAGE_SIZE above
+	 * that, mmap will ignore our address hint.
+	 */
+	vma = rb_entry(rb_last(&mm->mm_rb), struct vm_area_struct, vm_rb);
+	addr = vma->vm_end + PAGE_SIZE;
+	addr = do_mmap_pgoff(NULL, addr, PAGE_SIZE, PROT_EXEC,
+					MAP_PRIVATE|MAP_ANONYMOUS, 0);
+	if (addr & ~PAGE_MASK) {
+		mm->context.uprobes_ssol_area = ERR_PTR(-ENOMEM);
+		printk(KERN_ERR "Uprobes failed to allocate a vma for"
+			" pid/tgid %d/%d for single-stepping out of line.\n",
+			current->pid, current->tgid);
+		return -1;
+	}
+
+	vma = find_vma(mm, addr);
+	BUG_ON(!vma);
+	/* avoid vma copy on fork() and don't expand when mremap() */
+	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
+
+	up_write(&mm->mmap_sem);
+	mm->context.uprobes_ssol_area = (void *)addr;
+	return 0;
+}
+
+/*
+ * Initialize per-process area for single stepping out-of-line.
+ * Must be run by a thread in the probed process.
+ */
+static enum uprobe_ssol_area_state
+			uprobe_init_ssol(struct uprobe_ssol_area *area)
+{
+	struct uprobe_ssol_slot *slot;
+	int i;
+
+	 /*
+	  * If we previously probed this process and then removed all
+	  * probes, the vma is still available to us.
+	  */
+	if (!current->mm->context.uprobes_ssol_area) {
+		if (uprobe_setup_ssol_vma() != 0)
+			return SSOL_SETUP_FAILED;
+	}
+	if (IS_ERR(current->mm->context.uprobes_ssol_area))
+		/* Tried to set it up before, but failed. */
+		return SSOL_SETUP_FAILED;
+
+	area->insn_area = (uprobe_opcode_t *)
+			 current->mm->context.uprobes_ssol_area;
+	area->slots = (struct uprobe_ssol_slot *)
+		kzalloc(sizeof(struct uprobe_ssol_slot) *
+					UINSNS_PER_PAGE, GFP_USER);
+	if (!area->slots)
+		return SSOL_SETUP_FAILED;
+	area->nslots = UINSNS_PER_PAGE;
+	spin_lock_init(&area->lock);
+	area->next_slot = 0;
+	for (i = 0; i < UINSNS_PER_PAGE; i++) {
+		slot = &area->slots[i];
+		init_rwsem(&slot->rwsem);
+		slot->state = SSOL_FREE;
+		slot->owner = NULL;
+		slot->last_used = 0;
+		slot->insn = (uprobe_opcode_t *)
+			((unsigned long)area->insn_area
+			+ (i * MAX_UINSN_BYTES));
+	}
+	return SSOL_SETUP_OK;
+}
+
+/*
+ * Verify that the SSOL area has been set up for uproc.  Returns
+ * SSOL_SETUP_OK or SSOL_SETUP_FAILED.
+ */
+static enum uprobe_ssol_area_state
+			uprobe_verify_ssol(struct uprobe_process *uproc)
+{
+	struct uprobe_ssol_area *area = &uproc->ssol_area;
+	
+	if (likely(area->state != SSOL_NOT_SETUP))
+		return area->state;
+	/* First time through */
+	mutex_lock(&area->setup_mutex);
+	if (likely(area->state == SSOL_NOT_SETUP))
+		/* Nobody snuck in and set things up ahead of us. */
+		area->state = uprobe_init_ssol(area);
+	mutex_unlock(&area->setup_mutex);
+	return area->state;
+}
+
+static inline int advance_slot(int slot, struct uprobe_ssol_area *area)
+{
+	return (slot + 1) % area->nslots;
+}
+
+/*
+ * Choose an instruction slot and take it.  Choose a free slot if there is one.
+ * Otherwise choose the least-recently-used slot.  Returns with slot
+ * read-locked and containing the desired instruction.  Runs with
+ * uk->slot_mutex locked.
+ *
+ * TODO: Keep track of the number of free slots.  If none are free, don't
+ * look through the whole SSOL area for the LRU slot; just look at (say)
+ * the next 10 slots and pick the LRU among those.
+ */
+static struct uprobe_ssol_slot *uprobe_lru_insn_slot(struct uprobe_kimg *uk)
+{
+	struct uprobe_process *uproc = uk->uproc;
+	struct uprobe_ssol_area *area = &uproc->ssol_area;
+	struct uprobe_ssol_slot *s;
+	int slot, best_slot, len;
+	unsigned long lru_time, flags;
+
+	spin_lock_irqsave(&area->lock, flags);
+
+	/*
+	 * Find a slot to take.  Start looking at next_slot.  Don't bother
+	 * locking individual slots while we decide.
+	 */
+	best_slot = -1;
+	lru_time = ULONG_MAX;
+	slot = area->next_slot;
+	do {
+		s = &area->slots[slot];
+		if (s->state == SSOL_FREE) {
+			best_slot = slot;
+			break;
+		}
+		if (s->state == SSOL_ASSIGNED && lru_time > s->last_used) {
+			lru_time = s->last_used;
+			best_slot = slot;
+		}
+		slot = advance_slot(slot, area);
+	} while (slot != area->next_slot);
+
+	if (unlikely(best_slot < 0))
+		/* All slots are in the act of being stolen.  Join the melee. */
+		slot = area->next_slot;
+	else
+		slot = best_slot;
+	area->next_slot = advance_slot(slot, area);
+	s = &area->slots[slot];
+	s->state = SSOL_BEING_STOLEN;
+
+	spin_unlock_irqrestore(&area->lock, flags);
+
+	down_write(&s->rwsem);
+	uk->slot = s;
+	s->owner = uk;
+	s->last_used = jiffies;
+	s->state = SSOL_ASSIGNED;
+	/* Copy the original instruction to the chosen slot. */
+	len = access_process_vm(current,(unsigned long)s->insn,
+					 uk->insn, MAX_UINSN_BYTES, 1);
+        if (unlikely(len < MAX_UINSN_BYTES)) {
+		up_write(&s->rwsem);
+		printk(KERN_ERR "Failed to copy instruction at %#lx"
+			" to SSOL area (%#lx)\n", uk->vaddr,
+			(unsigned long) area->slots);
+		return NULL;
+	}
+	/* Let other threads single-step in this slot. */
+	downgrade_write(&s->rwsem);
+	return s;
+}
+
+/* uk doesn't own a slot.  Get one for uk, and return it read-locked. */
+static struct uprobe_ssol_slot *uprobe_find_insn_slot(struct uprobe_kimg *uk)
+{
+	struct uprobe_ssol_slot *slot;
+
+	mutex_lock(&uk->slot_mutex);
+	slot = uk->slot;
+	if (unlikely(slot && slot->owner == uk)) {
+		/* Looks like another thread snuck in and got a slot for us. */
+		down_read(&slot->rwsem);
+		if (likely(slot->owner == uk)) {
+			slot->last_used = jiffies;
+			mutex_unlock(&uk->slot_mutex);
+			return slot;
+		}
+		/* ... but then somebody stole it. */
+		up_read(&slot->rwsem);
+	}
+	slot = uprobe_lru_insn_slot(uk);
+	mutex_unlock(&uk->slot_mutex);
+	return slot;
+}
+
+/*
+ * Ensure that uk owns an instruction slot for single-stepping.
+ * Returns with the slot read-locked and uk->slot pointing at it.
+ */
+struct uprobe_ssol_slot *uprobe_get_insn_slot(struct uprobe_kimg *uk)
+{
+	struct uprobe_ssol_slot *slot = uk->slot;
+
+	if (!slot)
+		return uprobe_find_insn_slot(uk);
+
+	down_read(&slot->rwsem);
+	if (slot->owner != uk) {
+		up_read(&slot->rwsem);
+		return uprobe_find_insn_slot(uk);
+	}
+	slot->last_used = jiffies;
+	return slot;
+}
+
+/*
  * utrace engine report callbacks
  */
 
@@ -908,6 +1160,30 @@ static inline void uprobe_post_ssin(stru
 	}
 }
 
+#ifdef SS_OUT_OF_LINE
+/*
+ * Prepare to single-step uk's probed instruction inline.
+ * Returns with uk->slot read-locked.
+ */
+static inline void uprobe_pre_ssout(struct uprobe_task *utask,
+	struct uprobe_kimg *uk, struct pt_regs *regs)
+{
+	int ret = uprobe_prepare_singlestep(uk, utask, regs);
+	BUG_ON(ret);
+}
+
+/* Prepare to continue execution after single-stepping out of line. */
+static inline void uprobe_post_ssout(struct uprobe_task *utask,
+	struct uprobe_kimg *uk, struct pt_regs *regs)
+{
+	up_read(&uk->slot->rwsem);
+	uprobe_resume_execution(uk, utask, regs);
+}
+#else
+#define uprobe_pre_ssout(utask, uk, regs) do {} while (0)
+#define uprobe_post_ssout(utask, uk, regs) do {} while (0)
+#endif	/* SS_OUT_OF_LINE */
+
 /*
  * Signal callback:
  *
@@ -947,10 +1223,20 @@ static u32 uprobe_report_signal(struct u
 	if (action != UTRACE_SIGNAL_CORE || info->si_signo != SIGTRAP)
 		goto no_interest;
 
+	/*
+	 * Set up the SSOL area if it's not already there.  We do this
+	 * here because we have to do it before handling the first
+	 * probepoint hit, the probed process has to do it, and this may
+	 * be the first time our probed process runs uprobes code.
+	 */
+	uproc = utask->uproc;
+	if (uproc->sstep_out_of_line &&
+			unlikely(uprobe_verify_ssol(uproc) != SSOL_SETUP_OK))
+		uproc->sstep_out_of_line = 0;
+
 	mutex_lock(&utask->mutex);
 	switch (utask->state) {
 	case UPTASK_RUNNING:
-		uproc = utask->uproc;
 		probept = arch_get_probept(regs);
 		down_read(&uproc->utable_rwsem);
 		uk = find_uprobe(uproc, probept, 1);
@@ -972,7 +1258,10 @@ static u32 uprobe_report_signal(struct u
 
 		utask->state = UPTASK_SSTEP_AFTER_BP;
 		mutex_unlock(&utask->mutex);
-		uprobe_pre_ssin(utask, uk, regs);
+		if (uproc->sstep_out_of_line)
+			uprobe_pre_ssout(utask, uk, regs);
+		else
+			uprobe_pre_ssin(utask, uk, regs);
 		/*
 		 * No other engines must see this signal, and the
 		 * signal shouldn't be passed on either.
@@ -983,7 +1272,10 @@ static u32 uprobe_report_signal(struct u
 	case UPTASK_SSTEP_AFTER_BP:
 		uk = utask->active_probe;
 		BUG_ON(!uk);
-		uprobe_post_ssin(utask, uk);
+		if (uproc->sstep_out_of_line)
+			uprobe_post_ssout(utask, uk, regs);
+		else
+			uprobe_post_ssin(utask, uk);
 
 		utask->active_probe = NULL;
 		ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN
@@ -1185,6 +1477,11 @@ static u32 uprobe_report_exit(struct utr
 		if (utask->state == UPTASK_BP_HIT)
 			/* Running handler */
 			up_read(&uk->rwsem);
+		else if (utask->state == UPTASK_SSTEP_AFTER_BP) {
+			/* Single-stepping */
+			if (uk->slot && uk->slot->owner == uk)
+				up_read(&uk->slot->rwsem);
+		}
 		mutex_unlock(&utask->mutex);
 	}
 
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-05-11 22:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-05  1:07 [RFC][PATCH 2/2] uprobes: single-step out of line Ernie Petrides
2007-05-07 22:02 ` Jim Keniston
2007-05-09  1:32   ` Ernie Petrides
2007-05-10 23:17     ` Jim Keniston
2007-05-11 22:31       ` Ernie Petrides
  -- strict thread matches above, loose matches on Subject: below --
2007-04-20 23:10 Jim Keniston

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).