public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes
@ 2006-11-21  6:53 Masami Hiramatsu
  2006-11-21  6:55 ` [RFC][PATCH 1/4][kprobe](djprobe) generalize the length of the instruction Masami Hiramatsu
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Masami Hiramatsu @ 2006-11-21  6:53 UTC (permalink / raw)
  To: Keshavamurthy, Anil S, Ingo Molnar, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi
  Cc: Satoshi Oshima, Hideo Aoki, Yumiko Sugita, Jim Keniston,
	Martin Bligh, Greg Kroah-Hartman

Hi Anil and Ingo,

I integrated the essence of the djprobe into kprobes. For this
purpose, I introduced the length member in the kprobe structure.

If you'd like to use it, specify the length of the instructions
which will be replaced by a jump code to that length member.
(Of cause, you also have to check whether the instructions are
 relocatable and don't include any jump target.)

There are some limitations if you specify the length.
- Must not specify the post_handler and the break_handler.
 Djprobe doesn't support those handlers.
- Must not modify EIP in the pre_handler.
 This modified EIP is just ignored.

And some behavior of kprobes will change.
- If you insert a kprobe in where another djprobe is already
 inserted in, it will become one probe of the multi-probes of
 the djprobe.
 In this case, if the kprobe has a post_handler or a break_handler,
 register_kprobe() will return -EEXIST and fail to register it.
- On the other hand, if you insert a djprobe in where another
 kprobe is already inserted in, it will become one probe of
 the multi-probes of the kprobe. This will never fail.
- If you insert a kprobe in the middle of the jump code which has
 inserted by another djprobe, register_kprobe() will return
 -EEXIST and fail to register it.

 From the user's point of view, it just seems to be optimized by
invoking commit_kprobes() if user sets the length member of kprobe.
So I call it the Direct Jump Optimized kprobes ("djprobe" for short).

NOTE: The patches following this mail depends on my previous patch
(kprobes-enable-booster-on-the-preemptible-kernel.patch).
http://sources.redhat.com/ml/systemtap/2006-q4/msg00453.html

Best Regards,

P.S. Kretprobe is also optimized by this patch.


---
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 1/4][kprobe](djprobe) generalize the length of the instruction
  2006-11-21  6:53 [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Masami Hiramatsu
@ 2006-11-21  6:55 ` Masami Hiramatsu
  2006-11-21  6:56 ` [RFC][PATCH 2/4][kprobe](djprobe) Direct jump optimized kprobes core patch Masami Hiramatsu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Masami Hiramatsu @ 2006-11-21  6:55 UTC (permalink / raw)
  To: Masami Hiramatsu, Keshavamurthy, Anil S, Ingo Molnar, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi
  Cc: Satoshi Oshima, Hideo Aoki, Yumiko Sugita, Jim Keniston,
	Martin Bligh, Greg Kroah-Hartman

Hi,

This patch generalizes get/free_insn_slot() to manage multiple length
instruction slots, because the djprobe (Direct Jump Optimized kprobe)
needs longer instruction slots than kprobes.

Thanks,

---
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com


---
 include/linux/kprobes.h |    6 +++
 kernel/kprobes.c        |   85 ++++++++++++++++++++++++++++++------------------
 2 files changed, 60 insertions(+), 31 deletions(-)

Index: linux-2.6.19-rc5-mm2/include/linux/kprobes.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/kprobes.h
+++ linux-2.6.19-rc5-mm2/include/linux/kprobes.h
@@ -157,6 +157,12 @@ struct kretprobe_instance {
 	struct task_struct *task;
 };

+struct kprobe_insn_page_list {
+	struct hlist_head list;
+	int insn_size;		/* size of an instruction slot */
+	int nr_garbage;
+};
+
 extern spinlock_t kretprobe_lock;
 extern struct mutex kprobe_mutex;
 extern int arch_prepare_kprobe(struct kprobe *p);
Index: linux-2.6.19-rc5-mm2/kernel/kprobes.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/kernel/kprobes.c
+++ linux-2.6.19-rc5-mm2/kernel/kprobes.c
@@ -77,19 +77,23 @@ static struct notifier_block kprobe_page
  * stepping on the instruction on a vmalloced/kmalloced/data page
  * is a recipe for disaster
  */
-#define INSNS_PER_PAGE	(PAGE_SIZE/(MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
+#define INSNS_PER_PAGE(size)	(PAGE_SIZE/(size * sizeof(kprobe_opcode_t)))

 struct kprobe_insn_page {
 	struct hlist_node hlist;
 	kprobe_opcode_t *insns;		/* Page of instruction slots */
-	char slot_used[INSNS_PER_PAGE];
 	int nused;
 	int ngarbage;
+	char slot_used[1];
 };

-static struct hlist_head kprobe_insn_pages;
-static int kprobe_garbage_slots;
-static int collect_garbage_slots(void);
+static struct kprobe_insn_page_list kprobe_insn_pages = {
+	.list = HLIST_HEAD_INIT,
+	.insn_size = MAX_INSN_SIZE,
+	.nr_garbage = 0,
+};
+
+static int collect_garbage_slots(struct kprobe_insn_page_list *plist);

 static int __kprobes check_safety(void)
 {
@@ -116,37 +120,41 @@ loop_end:
 }

 /**
- * get_insn_slot() - Find a slot on an executable page for an instruction.
+ * __get_insn_slot() - Find a slot on an executable page for an instruction.
  * We allocate an executable page if there's no room on existing ones.
  */
-kprobe_opcode_t __kprobes *get_insn_slot(void)
+kprobe_opcode_t __kprobes *__get_insn_slot(struct kprobe_insn_page_list *plist)
 {
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos;
+	int max_insn = INSNS_PER_PAGE(plist->insn_size);

       retry:
-	hlist_for_each(pos, &kprobe_insn_pages) {
+	hlist_for_each(pos, &plist->list) {
 		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
-		if (kip->nused < INSNS_PER_PAGE) {
+		if (kip->nused < max_insn) {
 			int i;
-			for (i = 0; i < INSNS_PER_PAGE; i++) {
+			for (i = 0; i < max_insn; i++) {
 				if (!kip->slot_used[i]) {
 					kip->slot_used[i] = 1;
 					kip->nused++;
-					return kip->insns + (i * MAX_INSN_SIZE);
+					return kip->insns +
+						(i * plist->insn_size);
 				}
 			}
 			/* Surprise!  No unused slots.  Fix kip->nused. */
-			kip->nused = INSNS_PER_PAGE;
+			kip->nused = max_insn;
 		}
 	}

 	/* If there are any garbage slots, collect it and try again. */
-	if (kprobe_garbage_slots && collect_garbage_slots() == 0) {
+	if (plist->nr_garbage > 0 && collect_garbage_slots(plist) == 0) {
 		goto retry;
 	}
 	/* All out of space.  Need to allocate a new page. Use slot 0. */
-	kip = kmalloc(sizeof(struct kprobe_insn_page), GFP_KERNEL);
+	kip = kmalloc(sizeof(struct kprobe_insn_page) +
+		      sizeof(char) * (max_insn - 1),
+		      GFP_KERNEL);
 	if (!kip) {
 		return NULL;
 	}
@@ -162,8 +170,8 @@ kprobe_opcode_t __kprobes *get_insn_slot
 		return NULL;
 	}
 	INIT_HLIST_NODE(&kip->hlist);
-	hlist_add_head(&kip->hlist, &kprobe_insn_pages);
-	memset(kip->slot_used, 0, INSNS_PER_PAGE);
+	hlist_add_head(&kip->hlist, &plist->list);
+	memset(kip->slot_used, 0, max_insn);
 	kip->slot_used[0] = 1;
 	kip->nused = 1;
 	kip->ngarbage = 0;
@@ -171,7 +179,8 @@ kprobe_opcode_t __kprobes *get_insn_slot
 }

 /* Return 1 if all garbages are collected, otherwise 0. */
-static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
+static int __kprobes collect_one_slot(struct kprobe_insn_page_list *plist,
+				      struct kprobe_insn_page *kip, int idx)
 {
 	kip->slot_used[idx] = 0;
 	kip->nused--;
@@ -183,10 +192,10 @@ static int __kprobes collect_one_slot(st
 		 * next time somebody inserts a probe.
 		 */
 		hlist_del(&kip->hlist);
-		if (hlist_empty(&kprobe_insn_pages)) {
+		if (hlist_empty(&plist->list)) {
 			INIT_HLIST_NODE(&kip->hlist);
 			hlist_add_head(&kip->hlist,
-				       &kprobe_insn_pages);
+				       &plist->list);
 		} else {
 			module_free(NULL, kip->insns);
 			kfree(kip);
@@ -196,54 +205,68 @@ static int __kprobes collect_one_slot(st
 	return 0;
 }

-static int __kprobes collect_garbage_slots(void)
+static int __kprobes
+	collect_garbage_slots(struct kprobe_insn_page_list *plist)
 {
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos, *next;
+	int max_insn = INSNS_PER_PAGE(plist->insn_size);

 	/* Ensure no-one is preepmted on the garbages */
 	if (check_safety() != 0)
 		return -EAGAIN;

-	hlist_for_each_safe(pos, next, &kprobe_insn_pages) {
+	hlist_for_each_safe(pos, next, &plist->list) {
 		int i;
 		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
 		if (kip->ngarbage == 0)
 			continue;
 		kip->ngarbage = 0;	/* we will collect all garbages */
-		for (i = 0; i < INSNS_PER_PAGE; i++) {
+		for (i = 0; i < max_insn; i++) {
 			if (kip->slot_used[i] == -1 &&
-			    collect_one_slot(kip, i))
+			    collect_one_slot(plist, kip, i))
 				break;
 		}
 	}
-	kprobe_garbage_slots = 0;
+	plist->nr_garbage = 0;
 	return 0;
 }

-void __kprobes free_insn_slot(kprobe_opcode_t * slot, int dirty)
+void __kprobes __free_insn_slot(struct kprobe_insn_page_list *plist,
+				kprobe_opcode_t * slot, int dirty)
 {
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos;
+	int max_insn = INSNS_PER_PAGE(plist->insn_size);

-	hlist_for_each(pos, &kprobe_insn_pages) {
+	hlist_for_each(pos, &plist->list) {
 		kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
 		if (kip->insns <= slot &&
-		    slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
-			int i = (slot - kip->insns) / MAX_INSN_SIZE;
+		    slot < kip->insns + (max_insn * plist->insn_size)) {
+			int i = (slot - kip->insns) / plist->insn_size;
 			if (dirty) {
 				kip->slot_used[i] = -1;
 				kip->ngarbage++;
 			} else {
-				collect_one_slot(kip, i);
+				collect_one_slot(plist, kip, i);
 			}
 			break;
 		}
 	}
-	if (dirty && (++kprobe_garbage_slots > INSNS_PER_PAGE)) {
-		collect_garbage_slots();
+	if (dirty && (++plist->nr_garbage > max_insn)) {
+		collect_garbage_slots(plist);
 	}
 }
+
+kprobe_opcode_t __kprobes *get_insn_slot(void)
+{
+	return __get_insn_slot(&kprobe_insn_pages);
+}
+
+void __kprobes free_insn_slot(kprobe_opcode_t * slot, int dirty)
+{
+	__free_insn_slot(&kprobe_insn_pages, slot, dirty);
+}
 #endif

 /* We have preemption disabled.. so it is safe to use __ versions */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 2/4][kprobe](djprobe) Direct jump optimized kprobes core  patch
  2006-11-21  6:53 [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Masami Hiramatsu
  2006-11-21  6:55 ` [RFC][PATCH 1/4][kprobe](djprobe) generalize the length of the instruction Masami Hiramatsu
@ 2006-11-21  6:56 ` Masami Hiramatsu
  2006-11-21  6:57 ` [RFC][PATCH 3/4][kprobe](djprobe) djprobe for i386 architecture code Masami Hiramatsu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Masami Hiramatsu @ 2006-11-21  6:56 UTC (permalink / raw)
  To: Keshavamurthy, Anil S, Ingo Molnar, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi
  Cc: Masami Hiramatsu, Satoshi Oshima, Hideo Aoki, Yumiko Sugita,
	Jim Keniston, Martin Bligh, Greg Kroah-Hartman

Hi,

This patch is the architecture independent part of the djprobe.

In this patch, both of jump inserting and buffer releasing are
executed by the commit_kprobes() batch function. This batch
function freezes processes while inserting jumps and releasing
buffers on the preemptible kernel. So we can do it safely.
(This safety check routine is provided by the previous patch.
http://sources.redhat.com/ml/systemtap/2006-q4/msg00453.html)
In this patch, commit_kprobes() doesn't do anything to
unoptimized kprobes. But I think this method can be used for
speeding up unregistration of kprobes.

Definitely, the handlers of kprobes whose length member is set
are activated/deactivated right after invoked register/
unregister_kprobe() functions. Before invoking commit_kprobes(),
these handlers are invoked by normal kprobes.

Thanks,

-- 
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

---
 include/linux/kprobes.h |   44 ++++++++
 kernel/kprobes.c        |  263 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 290 insertions(+), 17 deletions(-)
Index: linux-2.6.19-rc5-mm2/include/linux/kprobes.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/kprobes.h
+++ linux-2.6.19-rc5-mm2/include/linux/kprobes.h
@@ -28,6 +28,9 @@
  * 2005-May	Hien Nguyen <hien@us.ibm.com> and Jim Keniston
  *		<jkenisto@us.ibm.com>  and Prasanna S Panchamukhi
  *		<prasanna@in.ibm.com> added function-return probes.
+ * 2006-Nov	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> and
+ *		Satoshi Oshima <soshima@redhat.com> added direct-jump
+ *		optimization (a.k.a. djprobe).
  */
 #include <linux/list.h>
 #include <linux/notifier.h>
@@ -83,6 +86,9 @@ struct kprobe {
 	/* Offset into the symbol */
 	unsigned int offset;

+	/* Length of the replaced instructions for direct jump optimization */
+	unsigned int length;
+
 	/* Called before addr is executed. */
 	kprobe_pre_handler_t pre_handler;

@@ -173,6 +179,39 @@ extern void show_registers(struct pt_reg
 extern kprobe_opcode_t *get_insn_slot(void);
 extern void free_insn_slot(kprobe_opcode_t *slot, int dirty);
 extern void kprobes_inc_nmissed_count(struct kprobe *p);
+extern kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_page_list *pages);
+extern void __free_insn_slot(struct kprobe_insn_page_list *pages,
+			     kprobe_opcode_t * slot, int dirty);
+extern int __kprobes aggr_pre_handler(struct kprobe *p, struct pt_regs *regs);
+
+#ifdef ARCH_SUPPORTS_DJPROBES
+/*
+ * Direct jump optimized probe
+ * Note:
+ * User can optimize his kprobe to reduce "break" overhead. He must
+ * analyze target instructions, count its length and confirm those
+ * instructions are relocatable. After that, set the length member of
+ * the kprobe to that length. If unsure, clear length member.
+ */
+struct djprobe_instance {
+	struct kprobe kp;
+	struct list_head list;	/* list for commitment */
+	struct arch_djprobe_stub stub;
+};
+
+/* architecture dependent functions for direct jump optimization */
+extern int arch_prepare_djprobe_instance(struct djprobe_instance *djpi);
+extern void arch_release_djprobe_instance(struct djprobe_instance *djpi);
+extern void arch_preoptimize_djprobe_instance(struct djprobe_instance *djpi);
+extern void arch_optimize_djprobe_instance(struct djprobe_instance *djpi);
+extern void arch_unoptimize_djprobe_instance(struct djprobe_instance *djpi);
+extern void arch_serialize_cpus(void);
+extern int arch_switch_to_stub(struct djprobe_instance *djpi,
+			       struct pt_regs * regs);
+#else
+#define MIN_KPROBE_LENGTH 0
+#define MAX_KPROBE_LENGTH 0
+#endif

 /* Get the kprobe at this addr (if any) - called with preemption disabled */
 struct kprobe *get_kprobe(void *addr);
@@ -196,6 +235,7 @@ static inline struct kprobe_ctlblk *get_

 int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
+int commit_kprobes(void);
 int setjmp_pre_handler(struct kprobe *, struct pt_regs *);
 int longjmp_break_handler(struct kprobe *, struct pt_regs *);
 int register_jprobe(struct jprobe *p);
@@ -226,6 +266,10 @@ static inline int register_kprobe(struct
 static inline void unregister_kprobe(struct kprobe *p)
 {
 }
+static inline int commit_kprobes(void)
+{
+	return -ENOSYS;
+}
 static inline int register_jprobe(struct jprobe *p)
 {
 	return -ENOSYS;
Index: linux-2.6.19-rc5-mm2/kernel/kprobes.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/kernel/kprobes.c
+++ linux-2.6.19-rc5-mm2/kernel/kprobes.c
@@ -30,6 +30,9 @@
  * 2005-May	Hien Nguyen <hien@us.ibm.com>, Jim Keniston
  *		<jkenisto@us.ibm.com> and Prasanna S Panchamukhi
  *		<prasanna@in.ibm.com> added function-return probes.
+ * 2006-Nov	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> and
+ *		Satoshi Oshima <soshima@redhat.com> added direct-jump
+ *		optimization (a.k.a. djprobe).
  */
 #include <linux/kprobes.h>
 #include <linux/hash.h>
@@ -304,7 +307,7 @@ struct kprobe __kprobes *get_kprobe(void
  * Aggregate handlers for multiple kprobes support - these handlers
  * take care of invoking the individual kprobe handlers on p->list
  */
-static int __kprobes aggr_pre_handler(struct kprobe *p, struct pt_regs *regs)
+int __kprobes aggr_pre_handler(struct kprobe *p, struct pt_regs *regs)
 {
 	struct kprobe *kp;

@@ -363,11 +366,26 @@ static int __kprobes aggr_break_handler(
 	return ret;
 }

+static int __kprobes djprobe_pre_handler(struct kprobe *kp,
+					 struct pt_regs *regs);
+
+/* return true if the kprobe is an aggregator */
+static inline int is_aggr_kprobe(struct kprobe *p)
+{
+	return p->pre_handler == aggr_pre_handler ||
+		p->pre_handler == djprobe_pre_handler;
+}
+/* return true if the kprobe is a djprobe */
+static inline int is_djprobe(struct kprobe *p)
+{
+	return p->pre_handler == djprobe_pre_handler;
+}
+
 /* Walks the list and increments nmissed count for multiprobe case */
 void __kprobes kprobes_inc_nmissed_count(struct kprobe *p)
 {
 	struct kprobe *kp;
-	if (p->pre_handler != aggr_pre_handler) {
+	if (!is_aggr_kprobe(p)) {
 		p->nmissed++;
 	} else {
 		list_for_each_entry_rcu(kp, &p->list, list)
@@ -482,6 +500,7 @@ static inline void copy_kprobe(struct kp
 {
 	memcpy(&p->opcode, &old_p->opcode, sizeof(kprobe_opcode_t));
 	memcpy(&p->ainsn, &old_p->ainsn, sizeof(struct arch_specific_insn));
+	p->length = old_p->length;
 }

 /*
@@ -534,7 +553,10 @@ static int __kprobes register_aggr_kprob
 	int ret = 0;
 	struct kprobe *ap;

-	if (old_p->pre_handler == aggr_pre_handler) {
+	if (is_aggr_kprobe(old_p)) {
+		if (is_djprobe(old_p) &&
+		    (p->break_handler || p->post_handler))
+			return -EEXIST;
 		copy_kprobe(old_p, p);
 		ret = add_new_kprobe(old_p, p);
 	} else {
@@ -556,6 +578,209 @@ static int __kprobes in_kprobes_function
 	return 0;
 }

+/* Called with kprobe_mutex held */
+static int __kprobes __register_kprobe_core(struct kprobe *p)
+{
+	int ret;
+	if ((ret = arch_prepare_kprobe(p)) != 0)
+		return ret;
+
+	INIT_HLIST_NODE(&p->hlist);
+	hlist_add_head_rcu(&p->hlist,
+			   &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
+
+	if (atomic_add_return(1, &kprobe_count) == \
+				(ARCH_INACTIVE_KPROBE_COUNT + 1))
+		register_page_fault_notifier(&kprobe_page_fault_nb);
+
+	arch_arm_kprobe(p);
+	return 0;
+}
+
+#ifdef ARCH_SUPPORTS_DJPROBES
+static LIST_HEAD(registering_list);
+static LIST_HEAD(unregistering_list);
+
+/* Switch to stub buffer : this handler is invoked before inserting a jump */
+static int __kprobes djprobe_pre_handler(struct kprobe *kp,
+					 struct pt_regs *regs)
+{
+	struct djprobe_instance *djpi;
+	djpi = container_of(kp, struct djprobe_instance, kp);
+	return arch_switch_to_stub(djpi, regs);
+}
+
+static int __kprobes detect_probe_collision(struct kprobe *p)
+{
+	int i;
+	struct kprobe *list_p;
+	/* check collision with other optimized kprobes */
+	for (i = 1; i < MAX_KPROBE_LENGTH; i++) {
+		list_p = get_kprobe((char *)p->addr - i);
+		if (list_p) {
+			if (list_p->length > i) {
+				/* other djprobes partially covered */
+				return -EEXIST;
+			}
+			break;
+		}
+	}
+	/* check collision with other kprobes */
+	for (i = 1; i < p->length; i++) {
+		if (get_kprobe((char *)p->addr + i)) {
+			p->length = 0;	/* not optimizable */
+			break;
+		}
+	}
+	return 0;
+}
+
+static int __kprobes register_djprobe(struct kprobe *p)
+{
+	int ret = 0;
+	struct djprobe_instance *djpi;
+
+	if (p->break_handler || p->post_handler)
+		return -EINVAL;
+
+	/* allocate a new instance */
+	djpi = kzalloc(sizeof(struct djprobe_instance), GFP_KERNEL);
+	if (djpi == NULL) {
+		return -ENOMEM;
+	}
+	djpi->kp.addr = p->addr;
+	djpi->kp.length = p->length;
+	djpi->kp.pre_handler = djprobe_pre_handler;
+	INIT_LIST_HEAD(&djpi->list);
+	INIT_LIST_HEAD(&djpi->kp.list);
+
+	/* allocate and initialize stub */
+	if ((ret = arch_prepare_djprobe_instance(djpi)) < 0) {
+		kfree(djpi);
+		goto out;
+	}
+
+	/* register as a kprobe */
+	if ((ret = __register_kprobe_core(&djpi->kp)) != 0) {
+		arch_release_djprobe_instance(djpi);
+		kfree(djpi);
+		goto out;
+	}
+	list_add(&djpi->list, &registering_list);
+
+	register_aggr_kprobe(&djpi->kp, p);
+out:
+	return ret;
+}
+
+static void __kprobes unoptimize_djprobe(struct kprobe *p)
+{
+	struct djprobe_instance *djpi;
+	djpi = container_of(p, struct djprobe_instance, kp);
+	if (!list_empty(&djpi->list)) {
+		/* not committed yet */
+		list_del_init(&djpi->list);
+	} else {
+		/* unoptimize probe point */
+		arch_unoptimize_djprobe_instance(djpi);
+	}
+}
+
+static void __kprobes unregister_djprobe(struct kprobe *p)
+{
+	/* Djprobes will be freed by
+	 * commit_kprobes.
+	 */
+	struct djprobe_instance *djpi;
+	djpi = container_of(p, struct djprobe_instance, kp);
+	list_add(&djpi->list, &unregistering_list);
+}
+
+/* Called with kprobe_mutex held */
+static int __kprobes __commit_djprobes(void)
+{
+	struct djprobe_instance *djpi;
+
+	if (list_empty(&registering_list) &&
+	    list_empty(&unregistering_list)) {
+		return 0;
+	}
+
+	/* ensure safety */
+	if (check_safety() != 0) return -EAGAIN;
+
+	/* optimize probes */
+	if (!list_empty(&registering_list)) {
+		list_for_each_entry(djpi, &registering_list, list) {
+			arch_preoptimize_djprobe_instance(djpi);
+		}
+		arch_serialize_cpus();
+		while (!list_empty(&registering_list)) {
+			djpi = list_entry(registering_list.next,
+					  struct djprobe_instance, list);
+			list_del_init(&djpi->list);
+			arch_optimize_djprobe_instance(djpi);
+		}
+	}
+	/* release code buffer */
+	while (!list_empty(&unregistering_list)) {
+		djpi = list_entry(unregistering_list.next,
+				  struct djprobe_instance, list);
+		list_del(&djpi->list);
+		arch_release_djprobe_instance(djpi);
+		kfree(djpi);
+	}
+	return 0;
+}
+
+/*
+ * Commit to optimize kprobes and remove optimized kprobes.
+ */
+int __kprobes commit_kprobes(void)
+{
+	int ret = 0;
+	mutex_lock(&kprobe_mutex);
+	ret = __commit_djprobes();
+	mutex_unlock(&kprobe_mutex);
+	return ret;
+}
+
+#else	/* ARCH_SUPPORTS_DJPROBES */
+
+static int __kprobes djprobe_pre_handler(struct kprobe *kp,
+					 struct pt_regs *regs)
+{
+	return 0;
+}
+
+static inline int detect_probe_collision(struct kprobe *p)
+{
+	return 0;
+}
+
+int __kprobes commit_kprobes(void)
+{
+	return 0;
+}
+
+static inline int register_djprobe(struct kprobe *p)
+{
+	p->length = 0;	/* Disable direct jump optimization */
+	return __register_kprobe_core(p);
+}
+
+static inline void unoptimize_djprobe(struct kprobe *p)
+{
+	return;
+}
+
+static inline void unregister_djprobe(struct kprobe *p)
+{
+	kfree(p);
+}
+
+#endif	/* ARCH_SUPPORTS_DJPROBES */
+
 static int __kprobes __register_kprobe(struct kprobe *p,
 	unsigned long called_from)
 {
@@ -574,7 +799,8 @@ static int __kprobes __register_kprobe(s
 		kprobe_lookup_name(p->symbol_name, p->addr);
 	}

-	if (!p->addr)
+	if (!p->addr || (p->length &&
+	    (p->length < MIN_KPROBE_LENGTH || p->length > MAX_KPROBE_LENGTH)))
 		return -EINVAL;
 	p->addr = (kprobe_opcode_t *)(((char *)p->addr)+ p->offset);

@@ -608,19 +834,14 @@ static int __kprobes __register_kprobe(s
 		goto out;
 	}

-	if ((ret = arch_prepare_kprobe(p)) != 0)
+	if ((ret = detect_probe_collision(p)) != 0)
 		goto out;

-	INIT_HLIST_NODE(&p->hlist);
-	hlist_add_head_rcu(&p->hlist,
-		       &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
-
-	if (atomic_add_return(1, &kprobe_count) == \
-				(ARCH_INACTIVE_KPROBE_COUNT + 1))
-		register_page_fault_notifier(&kprobe_page_fault_nb);
-
-	arch_arm_kprobe(p);
-
+	if (p->length) {
+		ret = register_djprobe(p);
+		goto out;
+	}
+	ret = __register_kprobe_core(p);
 out:
 	mutex_unlock(&kprobe_mutex);

@@ -656,10 +877,13 @@ void __kprobes unregister_kprobe(struct
 		return;
 	}
 valid_p:
-	if ((old_p == p) || ((old_p->pre_handler == aggr_pre_handler) &&
+	if ((old_p == p) || (is_aggr_kprobe(old_p) &&
 		(p->list.next == &old_p->list) &&
 		(p->list.prev == &old_p->list))) {
 		/* Only probe on the hash list */
+		if (is_djprobe(old_p)) {
+			unoptimize_djprobe(old_p);
+		}
 		arch_disarm_kprobe(p);
 		hlist_del_rcu(&old_p->hlist);
 		cleanup_p = 1;
@@ -678,7 +902,11 @@ valid_p:
 	if (cleanup_p) {
 		if (p != old_p) {
 			list_del_rcu(&p->list);
-			kfree(old_p);
+			if (is_djprobe(old_p)) {
+				unregister_djprobe(old_p);
+			} else {
+				kfree(old_p);
+			}
 		}
 		arch_remove_kprobe(p);
 	} else {
@@ -836,6 +1064,7 @@ __initcall(init_kprobes);

 EXPORT_SYMBOL_GPL(register_kprobe);
 EXPORT_SYMBOL_GPL(unregister_kprobe);
+EXPORT_SYMBOL_GPL(commit_kprobes);
 EXPORT_SYMBOL_GPL(register_jprobe);
 EXPORT_SYMBOL_GPL(unregister_jprobe);
 EXPORT_SYMBOL_GPL(jprobe_return);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 3/4][kprobe](djprobe) djprobe for i386 architecture code
  2006-11-21  6:53 [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Masami Hiramatsu
  2006-11-21  6:55 ` [RFC][PATCH 1/4][kprobe](djprobe) generalize the length of the instruction Masami Hiramatsu
  2006-11-21  6:56 ` [RFC][PATCH 2/4][kprobe](djprobe) Direct jump optimized kprobes core patch Masami Hiramatsu
@ 2006-11-21  6:57 ` Masami Hiramatsu
  2006-11-21  6:59 ` [RFC][PATCH 4/4][kprobe](djprobe) delayed invoking commit_kprobes() Masami Hiramatsu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Masami Hiramatsu @ 2006-11-21  6:57 UTC (permalink / raw)
  To: Keshavamurthy, Anil S, Ingo Molnar, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi
  Cc: Masami Hiramatsu, Satoshi Oshima, Hideo Aoki, Yumiko Sugita,
	Jim Keniston, Martin Bligh, Greg Kroah-Hartman

Hi,

This patch is i386 architecture dependent part of the djprobe.
It is completely merged to arch/i386/kprobes.c and
include/asm-i386/kprobes.h.
And I modified the template of the stub code to use fast call.

Thanks,

-- 
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

---
 arch/i386/kernel/kprobes.c |  233 ++++++++++++++++++++++++++++++++++++++++-----
 include/asm-i386/kprobes.h |   11 ++
 2 files changed, 222 insertions(+), 22 deletions(-)

Index: linux-2.6.19-rc5-mm2/arch/i386/kernel/kprobes.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/arch/i386/kernel/kprobes.c
+++ linux-2.6.19-rc5-mm2/arch/i386/kernel/kprobes.c
@@ -26,6 +26,9 @@
  * 2005-May	Hien Nguyen <hien@us.ibm.com>, Jim Keniston
  *		<jkenisto@us.ibm.com> and Prasanna S Panchamukhi
  *		<prasanna@in.ibm.com> added function-return probes.
+ * 2006-Nov	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> and
+ *		Satoshi Oshima <soshima@redhat.com> added direct-jump
+ *		optimization (a.k.a. djprobe).
  */

 #include <linux/kprobes.h>
@@ -40,9 +43,51 @@ void jprobe_return_end(void);

 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
+/* Instruction pages for djprobe's stub code */
+static struct kprobe_insn_page_list djprobe_insn_pages = {
+	.list = HLIST_HEAD_INIT,
+	.insn_size = 0,
+	.nr_garbage = 0
+};
+
+/* Used in djprobe_template_holder and kretprobe_trampoline_holder  */
+#define SAVE_REGS_STRING	\
+	"	pushl %gs\n"	\
+	"	pushl %es\n"	\
+	"	pushl %ds\n"	\
+	"	pushl %eax\n"	\
+	"	pushl %ebp\n"	\
+	"	pushl %edi\n"	\
+	"	pushl %esi\n"	\
+	"	pushl %edx\n"	\
+	"	pushl %ecx\n"	\
+	"	pushl %ebx\n"
+#define RESTORE_REGS_STRING	\
+	"	popl %ebx\n"	\
+	"	popl %ecx\n"	\
+	"	popl %edx\n"	\
+	"	popl %esi\n"	\
+	"	popl %edi\n"	\
+	"	popl %ebp\n"	\
+	"	popl %eax\n"

-/* insert a jmp code */
-static __always_inline void set_jmp_op(void *from, void *to)
+/*
+ * On pentium series, Unsynchronized cross-modifying code
+ * operations can cause unexpected instruction execution results.
+ * So after code modified, we should synchronize it on each processor.
+ */
+static void __local_serialize_cpu(void *info)
+{
+	sync_core();
+}
+
+void arch_serialize_cpus(void)
+{
+	on_each_cpu(__local_serialize_cpu, NULL, 1, 1);
+}
+
+/* Insert a jmp code */
+static __always_inline void set_jmp_op(void *from, void *to, int call)
 {
 	struct __arch_jmp_op {
 		char op;
@@ -50,7 +95,11 @@ static __always_inline void set_jmp_op(v
 	} __attribute__((packed)) *jop;
 	jop = (struct __arch_jmp_op *)from;
 	jop->raddr = (long)(to) - ((long)(from) + 5);
-	jop->op = RELATIVEJUMP_INSTRUCTION;
+	if (call) {
+		jop->op = RELATIVECALL_INSTRUCTION;
+	} else {
+		jop->op = RELATIVEJUMP_INSTRUCTION;
+	}
 }

 /*
@@ -363,16 +412,7 @@ no_kprobe:
 			"	pushf\n"
 			/* skip cs, eip, orig_eax */
 			"	subl $12, %esp\n"
-			"	pushl %gs\n"
-			"	pushl %ds\n"
-			"	pushl %es\n"
-			"	pushl %eax\n"
-			"	pushl %ebp\n"
-			"	pushl %edi\n"
-			"	pushl %esi\n"
-			"	pushl %edx\n"
-			"	pushl %ecx\n"
-			"	pushl %ebx\n"
+			SAVE_REGS_STRING
 			"	movl %esp, %eax\n"
 			"	call trampoline_handler\n"
 			/* move eflags to cs */
@@ -380,14 +420,8 @@ no_kprobe:
 			"	movl %edx, 48(%esp)\n"
 			/* save true return address on eflags */
 			"	movl %eax, 52(%esp)\n"
-			"	popl %ebx\n"
-			"	popl %ecx\n"
-			"	popl %edx\n"
-			"	popl %esi\n"
-			"	popl %edi\n"
-			"	popl %ebp\n"
-			"	popl %eax\n"
-			/* skip eip, orig_eax, es, ds, gs */
+			RESTORE_REGS_STRING
+			/* skip gs, ds, es, orig_eax, eip */
 			"	addl $20, %esp\n"
 			"	popf\n"
 			"	ret\n");
@@ -539,7 +573,7 @@ static void __kprobes resume_execution(s
 			 * jumps back to correct address.
 			 */
 			set_jmp_op((void *)regs->eip,
-				   (void *)orig_eip + (regs->eip - copy_eip));
+				   (void *)orig_eip + (regs->eip - copy_eip), 0);
 			p->ainsn.boostable = 1;
 		} else {
 			p->ainsn.boostable = -1;
@@ -753,7 +787,162 @@ int __kprobes longjmp_break_handler(stru
 	return 0;
 }

+#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PM)
+/* Functions for Direct Jump Optimization (djprobe) */
+/* stub template addresses */
+ void __kprobes djprobe_template_holder(void)
+ {
+	asm volatile ( ".global arch_tmpl_stub_entry\n"
+			"arch_tmpl_stub_entry: \n"
+		       "	pushf\n"
+		       /* skip cs, eip, orig_eax */
+		       "	subl $12, %esp\n"
+		       SAVE_REGS_STRING
+		       "	movl %esp, %edx\n"
+		       ".global arch_tmpl_stub_val\n"
+		       "arch_tmpl_stub_val: \n"
+		       "	movl $0xffffffff, %eax\n"
+		       ".global arch_tmpl_stub_call\n"
+		       "arch_tmpl_stub_call: \n"
+		       ASM_NOP5
+		       RESTORE_REGS_STRING
+			/* skip gs, ds, es, orig_eax, eip, cs */
+			"	addl $24, %esp\n"
+			"	popf\n"
+		       ".global arch_tmpl_stub_end\n"
+		       "arch_tmpl_stub_end: \n");
+ }
+
+/* djprobe call back function: called from stub code */
+fastcall static void djprobe_callback(struct djprobe_instance *djpi,
+				      struct pt_regs *regs)
+{
+	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+	preempt_disable();
+	if (kprobe_running()) {
+		kprobes_inc_nmissed_count(&djpi->kp);
+	} else {
+		/* save skipped registers */
+		regs->xcs = __KERNEL_CS;
+		regs->eip = (long)djpi->kp.addr + sizeof(kprobe_opcode_t);
+		regs->orig_eax = 0xffffffff;
+
+		__get_cpu_var(current_kprobe) = &djpi->kp;
+		kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+		aggr_pre_handler(&djpi->kp, regs);
+		__get_cpu_var(current_kprobe) = NULL;
+	}
+	preempt_enable_no_resched();
+}
+
+extern kprobe_opcode_t arch_tmpl_stub_entry;
+extern kprobe_opcode_t arch_tmpl_stub_val;
+extern kprobe_opcode_t arch_tmpl_stub_call;
+extern kprobe_opcode_t arch_tmpl_stub_end;
+
+#define STUB_VAL_IDX \
+	((long)&arch_tmpl_stub_val - (long)&arch_tmpl_stub_entry + 1)
+#define STUB_CALL_IDX \
+	((long)&arch_tmpl_stub_call - (long)&arch_tmpl_stub_entry)
+#define STUB_END_IDX \
+	((long)&arch_tmpl_stub_end - (long)&arch_tmpl_stub_entry)
+
+#define INT3_SIZE 1
+#define JUMP_SIZE 5
+#define ADDR_SIZE 4
+#define STUB_SIZE \
+	(STUB_END_IDX + MAX_KPROBE_LENGTH + JUMP_SIZE)
+
+static __always_inline void __codecopy(void *dest, const void *src, size_t size)
+{
+	memcpy(dest, src, size);
+	flush_icache_range((unsigned long)dest, (unsigned long)dest + size);
+}
+
+/*
+ * Copy post processing instructions
+ * Target instructions MUST be relocatable.
+ */
+int __kprobes arch_prepare_djprobe_instance(struct djprobe_instance *djpi)
+{
+	char *stub;
+
+	djpi->stub.insn = __get_insn_slot(&djprobe_insn_pages);
+	if (djpi->stub.insn == NULL) {
+		return -ENOMEM;
+	}
+	stub = (char *)djpi->stub.insn;
+
+	/* copy arch-dep-instance from template */
+	memcpy(stub, &arch_tmpl_stub_entry, STUB_END_IDX);
+
+	/* set probe information */
+	*((long *)(stub + STUB_VAL_IDX)) = (long)djpi;
+	/* set probe function call */
+	set_jmp_op(stub + STUB_CALL_IDX, djprobe_callback, 1);
+
+	/* copy instructions into the out-of-line buffer */
+	memcpy(stub + STUB_END_IDX, djpi->kp.addr, djpi->kp.length);
+
+	/* set returning jmp instruction at the tail of out-of-line buffer */
+	set_jmp_op(stub + STUB_END_IDX + djpi->kp.length,
+		   (char *)djpi->kp.addr + djpi->kp.length, 0);
+
+	flush_icache_range((unsigned long) stub,
+			   (unsigned long) stub + STUB_END_IDX +
+			   djpi->kp.length + JUMP_SIZE);
+	return 0;
+}
+
+void __kprobes arch_release_djprobe_instance(struct djprobe_instance *djpi)
+{
+	if (djpi->stub.insn)
+		__free_insn_slot(&djprobe_insn_pages, djpi->stub.insn, 0);
+}
+
+void __kprobes arch_preoptimize_djprobe_instance(struct djprobe_instance *djpi)
+{
+	long rel =
+	    (long)(djpi->stub.insn) - ((long)(djpi->kp.addr) + JUMP_SIZE);
+	/* insert the destination address only */
+	__codecopy((void *)((char *)djpi->kp.addr + INT3_SIZE), &rel,
+		   ADDR_SIZE);
+}
+
+void __kprobes arch_optimize_djprobe_instance(struct djprobe_instance *djpi)
+{
+	kprobe_opcode_t op = RELATIVEJUMP_INSTRUCTION;
+	__codecopy(djpi->kp.addr, &op, sizeof(kprobe_opcode_t));
+}
+
+void __kprobes arch_unoptimize_djprobe_instance(struct djprobe_instance *djpi)
+{
+	/* change (the 1st byte of) jump to int3. */
+	arch_arm_kprobe(&djpi->kp);
+	arch_serialize_cpus();
+	/*
+	 * recover the instructions covered by the destination address.
+	 * the int3 will be removed by arch_disarm_kprobe()
+	 */
+	__codecopy((void *)((long)djpi->kp.addr + INT3_SIZE),
+		   (void *)((long)djpi->stub.insn + STUB_END_IDX + INT3_SIZE),
+		   ADDR_SIZE);
+}
+
+/* djprobe handler : switch to a bypass code */
+int __kprobes arch_switch_to_stub(struct djprobe_instance *djpi,
+				  struct pt_regs *regs)
+{
+	regs->eip = (unsigned long)djpi->stub.insn;
+	reset_current_kprobe();
+	preempt_enable_no_resched();
+	return 1;		/* already prepared */
+}
+#endif
+
 int __init arch_init_kprobes(void)
 {
+	djprobe_insn_pages.insn_size = STUB_SIZE;
 	return 0;
 }
Index: linux-2.6.19-rc5-mm2/include/asm-i386/kprobes.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/asm-i386/kprobes.h
+++ linux-2.6.19-rc5-mm2/include/asm-i386/kprobes.h
@@ -35,6 +35,7 @@ struct pt_regs;
 typedef u8 kprobe_opcode_t;
 #define BREAKPOINT_INSTRUCTION	0xcc
 #define RELATIVEJUMP_INSTRUCTION 0xe9
+#define RELATIVECALL_INSTRUCTION 0xe8
 #define MAX_INSN_SIZE 16
 #define MAX_STACK_SIZE 64
 #define MIN_STACK_SIZE(ADDR) (((MAX_STACK_SIZE) < \
@@ -44,9 +45,15 @@ typedef u8 kprobe_opcode_t;

 #define JPROBE_ENTRY(pentry)	(kprobe_opcode_t *)pentry
 #define ARCH_SUPPORTS_KRETPROBES
+#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PM)
+#define ARCH_SUPPORTS_DJPROBES
+#endif
 #define  ARCH_INACTIVE_KPROBE_COUNT 0
 #define flush_insn_slot(p)	do { } while (0)

+#define MAX_KPROBE_LENGTH (5 + MAX_INSN_SIZE - 1)
+#define MIN_KPROBE_LENGTH 5
+
 void arch_remove_kprobe(struct kprobe *p);
 void kretprobe_trampoline(void);

@@ -61,6 +68,10 @@ struct arch_specific_insn {
 	int boostable;
 };

+struct arch_djprobe_stub {
+	kprobe_opcode_t *insn;
+};
+
 struct prev_kprobe {
 	struct kprobe *kp;
 	unsigned long status;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 4/4][kprobe](djprobe) delayed invoking commit_kprobes()
  2006-11-21  6:53 [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2006-11-21  6:57 ` [RFC][PATCH 3/4][kprobe](djprobe) djprobe for i386 architecture code Masami Hiramatsu
@ 2006-11-21  6:59 ` Masami Hiramatsu
  2006-11-21 14:24 ` [RFC][kprobe](djprobe) djprobe examples Masami Hiramatsu
  2006-11-27 16:56 ` [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Ingo Molnar
  5 siblings, 0 replies; 13+ messages in thread
From: Masami Hiramatsu @ 2006-11-21  6:59 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Keshavamurthy, Anil S, Ingo Molnar, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi,
	Satoshi Oshima, Hideo Aoki, Yumiko Sugita, Jim Keniston,
	Martin Bligh, Greg Kroah-Hartman

Hi,

This patch invokes commit_kprobes() function from a worker.
This worker is scheduled automatically by register/
unregister_kprobe() functions (if registered/unregistered
kprobes can be optimized) and is executed in 0.1 second.
If the worker fails freezing processes, it will try again
after 0.1 second. And if it fails 10 times, it will stop trying.

Thanks,


-- 
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

---
 kernel/kprobes.c |   28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

Index: linux-2.6.19-rc5-mm2/kernel/kprobes.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/kernel/kprobes.c
+++ linux-2.6.19-rc5-mm2/kernel/kprobes.c
@@ -600,6 +600,15 @@ static int __kprobes __register_kprobe_c
 #ifdef ARCH_SUPPORTS_DJPROBES
 static LIST_HEAD(registering_list);
 static LIST_HEAD(unregistering_list);
+static void commit_work_fn(void * data);
+static DECLARE_WORK(commit_work, commit_work_fn, 0);
+#define KPROBE_COMMIT_DELAY (HZ/10)
+#define MAX_COMMIT_RETRY 10
+
+static __always_inline void kick_delayed_commit(void)
+{
+	schedule_delayed_work(&commit_work, KPROBE_COMMIT_DELAY);
+}

 /* Switch to stub buffer : this handler is invoked before inserting a jump */
 static int __kprobes djprobe_pre_handler(struct kprobe *kp,
@@ -667,6 +676,7 @@ static int __kprobes register_djprobe(st
 		goto out;
 	}
 	list_add(&djpi->list, &registering_list);
+	kick_delayed_commit();

 	register_aggr_kprobe(&djpi->kp, p);
 out:
@@ -694,6 +704,7 @@ static void __kprobes unregister_djprobe
 	struct djprobe_instance *djpi;
 	djpi = container_of(p, struct djprobe_instance, kp);
 	list_add(&djpi->list, &unregistering_list);
+	kick_delayed_commit();
 }

 /* Called with kprobe_mutex held */
@@ -733,16 +744,27 @@ static int __kprobes __commit_djprobes(v
 	return 0;
 }

+static atomic_t try_count = ATOMIC_INIT(0);
 /*
  * Commit to optimize kprobes and remove optimized kprobes.
  */
 int __kprobes commit_kprobes(void)
 {
-	int ret = 0;
+	atomic_set(&try_count, 0);
+	commit_work_fn(NULL);
+	return 0;
+}
+
+static void commit_work_fn(void * data)
+{
+	if (atomic_inc_return(&try_count) > MAX_COMMIT_RETRY) return;
 	mutex_lock(&kprobe_mutex);
-	ret = __commit_djprobes();
+	if (__commit_djprobes() == -EAGAIN) {
+		kick_delayed_commit();	/* try again */
+	} else {
+		atomic_set(&try_count, 0);
+	}
 	mutex_unlock(&kprobe_mutex);
-	return ret;
 }

 #else	/* ARCH_SUPPORTS_DJPROBES */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][kprobe](djprobe) djprobe examples
  2006-11-21  6:53 [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2006-11-21  6:59 ` [RFC][PATCH 4/4][kprobe](djprobe) delayed invoking commit_kprobes() Masami Hiramatsu
@ 2006-11-21 14:24 ` Masami Hiramatsu
  2006-11-27 16:56 ` [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Ingo Molnar
  5 siblings, 0 replies; 13+ messages in thread
From: Masami Hiramatsu @ 2006-11-21 14:24 UTC (permalink / raw)
  To: Keshavamurthy, Anil S, Ingo Molnar, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi
  Cc: Masami Hiramatsu, Satoshi Oshima, Hideo Aoki, Yumiko Sugita,
	Jim Keniston, Martin Bligh, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

Hi,

I also updated djprobe example code.
I attached it and a helper script.

NOTE:
Currently, this helper script can ONLY measure the *LENGTH* of the
instruction-block which will be overwritten by a jump code. It can
*NOT CHECK* whether this instruction-block can be executed out of
line (relocatable) and no branch jumps into the target area.
However, now we're developing more useful helper tool which can
check it.

Here is the example of usage;
1) Analyze the kernel code by using the helper script.
$ ./disym.sh sys_symlink
sys_symlink
0xc017bbe0

/lib/modules/2.6.19-rc1-mm1/build/vmlinux:     file format elf32-i386

Disassembly of section .text:

c017bbe0 <sys_symlink>:
c017bbe0:       83 ec 0c                sub    $0xc,%esp
c017bbe3:       8b 44 24 14             mov    0x14(%esp),%eax

Please be sure that the above-disassembled instructions are relocatable.
Parameter: addr=0xc017bbe0 size=7


2) If the instructions can be executed out of line (ex. load/store,
 compare, add/sub, etc.) and no branch jumps into it (you can dump whole
 of the function by using disym.sh with '-a' option),
 Install the example module with the above parameters.

$ sudo /sbin/insmod ./djprobe_ex.ko addr=0xc017bbe0 size=7


3) and test it.

$ ln -s foo bar
$ dmesg | tail -n 4
probe install at c017bbe0, size 7
Stopping tasks: =======================================|
Restarting tasks... done
probe call:c017bbe0, caller:c01030c5

$ rm bar
$ ln -s foo bar
$ dmesg | tail -n 5
probe install at c017bbe0, size 7
Stopping tasks: =======================================|
Restarting tasks... done
probe call:c017bbe0, caller:c01030c5
probe call:c017bbe0, caller:c01030c5

4) Finally, remove the module.

$ sudo /sbin/rmmod djprobe_ex.ko
$ dmesg | tail -n 8
probe install at c017bbe0, size 7
Stopping tasks: =======================================|
Restarting tasks... done
probe call:c017bbe0, caller:c01030c5
probe call:c017bbe0, caller:c01030c5
probe uninstall at c017bbe0
Stopping tasks: =======================================|
Restarting tasks... done


Thanks,

-- 
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com


[-- Attachment #2: disym.sh --]
[-- Type: text/plain, Size: 1740 bytes --]

#!/bin/sh
# Copyright (C) HITACHI, Ltd. 2005
# Created by M.Hiramatsu <hiramatu@sdl.hitachi.co.jp>

[ $# -gt 3 -o $# -lt 1 ] && echo "usage: disym.sh [-a] <kernel_symbol> [kernel-version]" && exit 0

DISALL=0
if [ $1 = "-a" ] ;then
DISALL=1
shift 1
fi

SYM=$1
KVER=$2
[ -z "$KVER" ] && KVER=`uname -r`

function cntarg () {
return $#
}

SYSMAP=/lib/modules/$KVER/build/System.map
[ -f $SYSMAP ] || SYSMAP=/boot/System.map-`uname -r`
[ -f $SYSMAP ] || SYSMAP=/proc/kallsyms

VMLINUX=/lib/modules/$KVER/build/vmlinux
[ -f $VMLINUX ] || VMLINUX=/boot/vmlinux-`uname -r`
[ -f $VMLINUX ] || VMLINUX=/usr/lib/debug/lib/modules/$KVER/vmlinux

setaddrs () {
XADDR=$1
XEADDR=$2
}

echo $SYM
case $SYM in
	0x*)
	XADDR=$SYM
	SADDR=`printf "%d" $SYM`
	EADDR=`expr $SADDR + 5`
	;;
	*)
	if [ $DISALL -eq 1 ] ;then
	setaddrs `sort $SYSMAP | grep -A1 " $SYM"$  | cut -f 1 -d\ `
	if [ -z "$XADDR" ] ; then 
		echo "Error : $SYM was not found in "$SYSMAP
		exit 0;
	fi
	XADDR=0x$XADDR
	XEADDR=0x$XEADDR
	SADDR=`printf "%d" $XADDR` 
	EADDR=`printf "%d" $XEADDR` 
	else
	XADDR=0x`grep " $SYM"$ $SYSMAP | cut -f 1 -d\ `
	if [ "$XADDR" = "0x" ] ; then 
		echo "Error : $SYM was not found in "$SYSMAP
		exit 0;
	fi
	SADDR=`printf "%d" $XADDR` 
	EADDR=`expr $SADDR + 5`
	fi
	;;
esac
echo $XADDR

objdump -w --start-address=$SADDR --stop-address=$EADDR -j ".text" -d $VMLINUX
echo 
LLINE=`objdump -w --start-address=$SADDR --stop-address=$EADDR -j ".text" -d $VMLINUX | tail -n 1 | sed s/"	"/\:/g`
EXADDR=`echo $LLINE | cut -f 1 -d:`
cntarg `echo $LLINE | cut -f 3 -d:`
DIFF=$?
EADDR=`printf "%d" 0x$EXADDR`
SIZE=`expr $EADDR - $SADDR + $DIFF`
echo "Please be sure that the above-disassembled instructions are relocatable."
echo "Parameter: addr=$XADDR size=$SIZE"

[-- Attachment #3: djprobe_ex.c --]
[-- Type: text/plain, Size: 2225 bytes --]

/* 
 djprobe_ex.c -- Direct Jump Probe Example
 Copyright (c) 2005,2006 Hitachi,Ltd.,
 Created by Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
 
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation; either version 2 of the License, or
 (at your option) any later version.

 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.

 You should have received a copy of the GNU General Public License
 along with this program; if not, write to the Free Software
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/
#include <linux/version.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/rcupdate.h>
#include <linux/kprobes.h>

static long addr=0;
module_param(addr, long, 0444);
static long offs=0;
module_param(offs, long, 0444);
static long size=0;
module_param(size, long, 0444);
static long show_arg=0;
module_param(show_arg, long, 0444);

#define CALLER(regs) (((unsigned long *)&regs->esp)[0])
#define ARG(n,regs) (((unsigned long *)&regs->esp)[n]) /*arg1: ARG(1,stadr)*/

static int probe_func(struct kprobe *kp, struct pt_regs *regs)
{
	int i;
	printk("probe call:%p, caller:%lx", 
	       (void*)kp->addr, CALLER(regs));
	for (i = 1; i <= show_arg; i++) {
		printk(" arg[%d]:%lx", i, ARG(i, regs));
	}
	printk("\n");
	return 0;
}

static struct kprobe kp;

static int install_probe(void) 
{
	if (addr == 0) {
		return -EINVAL;
	}
	printk("probe install at 0x%p+0x%lx, size %ld\n", 
	       (void*)addr, offs, size);

	kp.pre_handler = probe_func;
	kp.addr = (void *)addr;
	kp.length = size;
	if (register_kprobe(&kp) != 0) return -1;
	commit_kprobes();
	return 0;
}

static void uninstall_probe(void)
{
	unregister_kprobe(&kp);
	printk("probe uninstall at %p\n", (void*)addr);
	commit_kprobes();	/* commit for safety */
}

module_init(install_probe);
module_exit(uninstall_probe);
MODULE_AUTHOR("M.Hiramatsu <masami.hiramatsu.pt@hitachi.com>");
MODULE_LICENSE("GPL");


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes
  2006-11-21  6:53 [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Masami Hiramatsu
                   ` (4 preceding siblings ...)
  2006-11-21 14:24 ` [RFC][kprobe](djprobe) djprobe examples Masami Hiramatsu
@ 2006-11-27 16:56 ` Ingo Molnar
  2006-11-28  0:05   ` Frank Ch. Eigler
  5 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2006-11-27 16:56 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Keshavamurthy, Anil S, SystemTAP, Ananth N Mavinakayanahalli,
	Prasanna S Panchamukhi, Satoshi Oshima, Hideo Aoki,
	Yumiko Sugita, Jim Keniston, Martin Bligh, Greg Kroah-Hartman

On Tue, 2006-11-21 at 15:48 +0900, Masami Hiramatsu wrote:
> Hi Anil and Ingo,
> 
> I integrated the essence of the djprobe into kprobes. For this
> purpose, I introduced the length member in the kprobe structure.
> 
> If you'd like to use it, specify the length of the instructions
> which will be replaced by a jump code to that length member.
> (Of cause, you also have to check whether the instructions are
>  relocatable and don't include any jump target.) 

cool stuff!

I'm wondering whether it could be made a 100% transparent speedup to
kprobes: how hard would it be to do a simplified disassembly of the
target address to automate the 'this kprobe can safely be turned into a
djprobe transparently' step, and hence to make this change completely
invisible to user-space utilities? Userspace would have to do something
like this anyway (unless i'm missing something), correct?

It might also be useful to implement some sort of query functionality,
to enable userspace to see which probes are sped up and which are not.
This could be a list of all probe points in /sys or /proc or /debugfs -
or a syscall extension - whichever fits the purpose best.

also, it would be nice to submit to this Andrew for -mm inclusion.

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes
  2006-11-27 16:56 ` [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Ingo Molnar
@ 2006-11-28  0:05   ` Frank Ch. Eigler
  2006-11-28  2:44     ` Satoshi Oshima
  0 siblings, 1 reply; 13+ messages in thread
From: Frank Ch. Eigler @ 2006-11-28  0:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Masami Hiramatsu, Keshavamurthy, Anil S, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi,
	Satoshi Oshima, Hideo Aoki, Yumiko Sugita, Jim Keniston,
	Martin Bligh, Greg Kroah-Hartman


mingo wrote:

> [...]  I'm wondering whether it could be made a 100% transparent
> speedup to kprobes: how hard would it be to do a simplified
> disassembly of the target address to automate the 'this kprobe can
> safely be turned into a djprobe transparently' step [...]

The entire criterion is not easy to check at the binary point.  In
particular, it is hard to tell whether some part of the overlaid
instruction sequence is the possible target of a branch elsewhere.

> Userspace would have to do something like this anyway (unless i'm
> missing something), correct? [...]

Yes, but debug- or symbol-consuming userspace would have more
information to judge from.

- FChE

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes
  2006-11-28  0:05   ` Frank Ch. Eigler
@ 2006-11-28  2:44     ` Satoshi Oshima
  2006-11-28 15:01       ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Satoshi Oshima @ 2006-11-28  2:44 UTC (permalink / raw)
  To: Frank Ch. Eigler, Ingo Molnar
  Cc: Masami Hiramatsu, Keshavamurthy, Anil S, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi, Hideo Aoki,
	Yumiko Sugita, Jim Keniston, Martin Bligh, Greg Kroah-Hartman

Frank Ch. Eigler wrote:
> mingo wrote:
> 
>> [...]  I'm wondering whether it could be made a 100% transparent
>> speedup to kprobes: how hard would it be to do a simplified
>> disassembly of the target address to automate the 'this kprobe can
>> safely be turned into a djprobe transparently' step [...]
> 
> The entire criterion is not easy to check at the binary point.  In
> particular, it is hard to tell whether some part of the overlaid
> instruction sequence is the possible target of a branch elsewhere.

Yes. This is the problem. We couldn't find anything good way to
ensure the safety of branch target without debuginfo.

Now we are developing safety check tool that bases on
elfutils.

And during this safety check process, userspace tool must count
the length of replaced instructions. So we chose the length as
a trigger to enable djprobes. It is not the problem to count 
the length of instructions inside kernel. 

If you have any suggestion on this, we appreciate them.


Satoshi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes
  2006-11-28  2:44     ` Satoshi Oshima
@ 2006-11-28 15:01       ` Ingo Molnar
  2006-11-28 15:56         ` Masami Hiramatsu
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2006-11-28 15:01 UTC (permalink / raw)
  To: Satoshi Oshima
  Cc: Frank Ch. Eigler, Masami Hiramatsu, Keshavamurthy, Anil S,
	SystemTAP, Ananth N Mavinakayanahalli, Prasanna S Panchamukhi,
	Hideo Aoki, Yumiko Sugita, Jim Keniston, Martin Bligh,
	Greg Kroah-Hartman

On Mon, 2006-11-27 at 19:05 -0500, Satoshi Oshima wrote:
> Yes. This is the problem. We couldn't find anything good way to
> ensure the safety of branch target without debuginfo. 

If existing in-kernel debug info is not enough then i'd suggest to add
an extra build pass to the kernel to add it (dependent on
CONFIG_KPROBES) - a'la CONFIG_UNWIND_INFO. Am i missing something?

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes
  2006-11-28 15:01       ` Ingo Molnar
@ 2006-11-28 15:56         ` Masami Hiramatsu
  2006-11-28 16:34           ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Masami Hiramatsu @ 2006-11-28 15:56 UTC (permalink / raw)
  To: Ingo Molnar, Satoshi Oshima
  Cc: Frank Ch. Eigler, Keshavamurthy, Anil S, SystemTAP,
	Ananth N Mavinakayanahalli, Prasanna S Panchamukhi, Hideo Aoki,
	Yumiko Sugita, Jim Keniston, Martin Bligh, Greg Kroah-Hartman

Hi Ingo,

Thank you for reviewing!

Ingo Molnar wrote:
> On Mon, 2006-11-27 at 19:05 -0500, Satoshi Oshima wrote:
>> Yes. This is the problem. We couldn't find anything good way to
>> ensure the safety of branch target without debuginfo. 

I think Satoshi said we can find a good way to ensure safety with
debuginfo.

> If existing in-kernel debug info is not enough then i'd suggest to add
> an extra build pass to the kernel to add it (dependent on
> CONFIG_KPROBES) - a'la CONFIG_UNWIND_INFO. Am i missing something?

As far as I know, there is no debuginfo in the kernel which is
loaded in the memory. But the debuginfo is included in the
"vmlinux" file (not the "vmlinuz" file).

So, I think he'd like to say that we can ensure safety by
analyzing the "vmlinux" file.
Is it right, Satoshi?

Thanks,

-- 
Masami HIRAMATSU
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes
  2006-11-28 15:56         ` Masami Hiramatsu
@ 2006-11-28 16:34           ` Ingo Molnar
  2006-11-28 17:52             ` Frank Ch. Eigler
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2006-11-28 16:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Satoshi Oshima, Frank Ch. Eigler, Keshavamurthy, Anil S,
	SystemTAP, Ananth N Mavinakayanahalli, Prasanna S Panchamukhi,
	Hideo Aoki, Yumiko Sugita, Jim Keniston, Martin Bligh,
	Greg Kroah-Hartman

On Tue, 2006-11-28 at 23:40 +0900, Masami Hiramatsu wrote:
> > If existing in-kernel debug info is not enough then i'd suggest to
> add
> > an extra build pass to the kernel to add it (dependent on
> > CONFIG_KPROBES) - a'la CONFIG_UNWIND_INFO. Am i missing something?
> 
> As far as I know, there is no debuginfo in the kernel which is
> loaded in the memory. But the debuginfo is included in the
> "vmlinux" file (not the "vmlinuz" file).

correct, that sort of debug info is not included in the kernel image -
but some of it could be included - just like unwind info, or kallsyms
and other info is included currently. Whatever can be extracted at build
time we can also insert into the kernel image. The info that is needed
here is a table of all valid instruction boundaries, correct?

OTOH, userspace (SystemTap) has all this information handy already,
because it already has to parse the -g debuginfo - hence it's more
natural to delegate this there?

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes
  2006-11-28 16:34           ` Ingo Molnar
@ 2006-11-28 17:52             ` Frank Ch. Eigler
  0 siblings, 0 replies; 13+ messages in thread
From: Frank Ch. Eigler @ 2006-11-28 17:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Masami Hiramatsu, Satoshi Oshima, Frank Ch. Eigler,
	Keshavamurthy, Anil S, SystemTAP, Ananth N Mavinakayanahalli,
	Prasanna S Panchamukhi, Hideo Aoki, Yumiko Sugita, Jim Keniston,
	Martin Bligh, Greg Kroah-Hartman

Hi -

On Tue, Nov 28, 2006 at 04:00:03PM +0100, Ingo Molnar wrote:

> [...]  correct, that sort of debug info is not included in the
> kernel image - but some of it could be included - just like unwind
> info, or kallsyms and other info is included currently. Whatever can
> be extracted at build time we can also insert into the kernel image.

That's true, but the total possibly needed information is huge.

> The info that is needed here is a table of all valid instruction
> boundaries, correct?

It's not clear that this is sufficient.  Statement boundaries (to the
extent they still exist in object code) may be the thing.

> OTOH, userspace (SystemTap) has all this information handy already,
> because it already has to parse the -g debuginfo - hence it's more
> natural to delegate this there?

Yes, probably.  While this maintains the possibility that if the
user-space screws up badly enough, the kernel dies, this is nothing new.

- FChE

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-11-28 15:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-21  6:53 [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Masami Hiramatsu
2006-11-21  6:55 ` [RFC][PATCH 1/4][kprobe](djprobe) generalize the length of the instruction Masami Hiramatsu
2006-11-21  6:56 ` [RFC][PATCH 2/4][kprobe](djprobe) Direct jump optimized kprobes core patch Masami Hiramatsu
2006-11-21  6:57 ` [RFC][PATCH 3/4][kprobe](djprobe) djprobe for i386 architecture code Masami Hiramatsu
2006-11-21  6:59 ` [RFC][PATCH 4/4][kprobe](djprobe) delayed invoking commit_kprobes() Masami Hiramatsu
2006-11-21 14:24 ` [RFC][kprobe](djprobe) djprobe examples Masami Hiramatsu
2006-11-27 16:56 ` [RFC][PATCH 0/4][kprobe](djprobe) Direct jump optimized kprobes Ingo Molnar
2006-11-28  0:05   ` Frank Ch. Eigler
2006-11-28  2:44     ` Satoshi Oshima
2006-11-28 15:01       ` Ingo Molnar
2006-11-28 15:56         ` Masami Hiramatsu
2006-11-28 16:34           ` Ingo Molnar
2006-11-28 17:52             ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).