[RFC 5/5] uprobes: add global breakpoints

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [RFC 5/5] uprobes: add global breakpoints
       [not found] <1344355952-2382-1-git-send-email-bigeasy@linutronix.de>
@ 2012-08-07 16:13 ` Sebastian Andrzej Siewior
  2012-08-08 13:19   ` Oleg Nesterov
                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-08-07 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Peter Zijlstra, Arnaldo Carvalho de Melo, Roland McGrath,
	Oleg Nesterov, Srikar Dronamraju, Ananth N Mavinakaynahalli,
	stan_shebs, Sebastian Andrzej Siewior, gdb-patches

By setting an uprobe tracepoint, one learns whenever a certain point
within a program is reached / passed. This is recorded and the
application continues.
This patch adds the ability to hold the program once this point has been
passed and the user may attach to the program via ptrace.
First, setup a global breakpoint which is very similar to a uprobe trace
point:

|echo 'g /home/bigeasy/sample:0x0000044d %ip %ax' > uprobe_events

This is exactly what uprobe does except that it starts with the letter
'g' instead of 'p'.

Step two is to enable it:
|echo 1 > events/uprobes/enable

Lets assume you execute ./sample and the breakpoint is hit. In ps you will
see:
|1938 pts/1    t+     0:00 ./sample

Now you can attach gdb via 'gdb -p 1938'. The gdb can now interact with
the tracee and inspect its registers, its stack, single step, let it
runâ€¦
In case the process is not of great interest, the user may continue
without gdb by writting its pid into the uprobe_gp_wakeup file

|echo 1938 > uprobe_gp_wakeup

What I miss right now is an interface to tell the user/gdb that there is a
program that hit a global breakpoint and is waiting for further instructions.
A "tail -f trace" does not work and may contain also a lot of other
informations. I've been thinking about a poll()able file which returns pids of
tasks which are put on hold. Other suggestions?

Cc: gdb-patches@sourceware.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/uprobes.h     |   10 +++++++
 kernel/events/uprobes.c     |   29 +++++++++++++++++--
 kernel/ptrace.c             |    4 ++-
 kernel/trace/trace_uprobe.c |   67 +++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0fc6585..991a665 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -63,6 +63,9 @@ enum uprobe_task_state {
 	UTASK_SSTEP,
 	UTASK_SSTEP_ACK,
 	UTASK_SSTEP_TRAPPED,
+	UTASK_TRACE_SLEEP,
+	UTASK_TRACE_WOKEUP_NORMAL,
+	UTASK_TRACE_WOKEUP_TRACED,
 };
 
 /*
@@ -76,6 +79,7 @@ struct uprobe_task {
 
 	unsigned long			xol_vaddr;
 	unsigned long			vaddr;
+	int				skip_handler;
 };
 
 /*
@@ -120,6 +124,8 @@ extern bool uprobe_deny_signal(void);
 extern bool __weak arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void uprobe_clear_state(struct mm_struct *mm);
 extern void uprobe_reset_state(struct mm_struct *mm);
+extern int uprobe_wakeup_task(struct task_struct *t, int traced);
+
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
@@ -163,5 +169,9 @@ static inline void uprobe_clear_state(struct mm_struct *mm)
 static inline void uprobe_reset_state(struct mm_struct *mm)
 {
 }
+static inline int uprobe_wakeup_task(struct task_struct *t, int traced)
+{
+	return 0;
+}
 #endif /* !CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c8e5204..f1326a2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1473,6 +1473,22 @@ void __weak arch_uprobe_disable_step(struct arch_uprobe *arch)
 	user_disable_single_step(current);
 }
 
+int uprobe_wakeup_task(struct task_struct *t, int traced)
+{
+	struct uprobe_task *utask;
+
+	utask = t->utask;
+	if (!utask)
+		return -EINVAL;
+	if (utask->state != UTASK_TRACE_SLEEP)
+		return -EINVAL;
+
+	utask->state = traced ?
+		UTASK_TRACE_WOKEUP_TRACED : UTASK_TRACE_WOKEUP_NORMAL;
+	wake_up_state(t, __TASK_TRACED);
+	return 0;
+}
+
 /*
  * Run handler and ask thread to singlestep.
  * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -1513,7 +1529,16 @@ static void handle_swbp(struct pt_regs *regs)
 			goto cleanup_ret;
 	}
 	utask->active_uprobe = uprobe;
-	handler_chain(uprobe, regs);
+	if (utask->skip_handler)
+		utask->skip_handler = 0;
+	else
+		handler_chain(uprobe, regs);
+
+	if (utask->state == UTASK_TRACE_WOKEUP_TRACED) {
+		send_sig(SIGTRAP, current, 0);
+		utask->skip_handler = 1;
+		goto cleanup_ret;
+	}
 	if (uprobe->flags & UPROBE_SKIP_SSTEP && can_skip_sstep(uprobe, regs))
 		goto cleanup_ret;
 
@@ -1528,7 +1553,7 @@ cleanup_ret:
 		utask->active_uprobe = NULL;
 		utask->state = UTASK_RUNNING;
 	}
-	if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
+	if (!(uprobe->flags & UPROBE_SKIP_SSTEP) || utask->skip_handler)
 
 		/*
 		 * cannot singlestep; cannot skip instruction;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a232bb5..5d6d3ed 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -286,8 +286,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	__ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
-	if (!seize)
+	if (!seize) {
 		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+		uprobe_wakeup_task(task, 1);
+	}
 
 	spin_lock(&task->sighand->siglock);
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index f3c3811..0aabee4 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -48,6 +48,7 @@ struct trace_uprobe {
 	unsigned int			flags;	/* For TP_FLAG_* */
 	ssize_t				size;	/* trace entry size */
 	unsigned int			nr_args;
+	bool				is_gb;
 	struct probe_arg		args[];
 };
 
@@ -177,19 +178,24 @@ static int create_trace_uprobe(int argc, char **argv)
 	struct path path;
 	unsigned long offset;
 	bool is_delete;
+	bool is_gb;
 	int i, ret;
 
 	inode = NULL;
 	ret = 0;
 	is_delete = false;
+	is_gb = false;
 	event = NULL;
 	group = NULL;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
 		is_delete = true;
+	else if (argv[0][0] == 'g')
+		is_gb = true;
 	else if (argv[0][0] != 'p') {
-		pr_info("Probe definition must be started with 'p' or '-'.\n");
+		pr_info("Probe definition must be started with 'p', 'g' or "
+				"'-'.\n");
 		return -EINVAL;
 	}
 
@@ -277,7 +283,8 @@ static int create_trace_uprobe(int argc, char **argv)
 		if (ptr)
 			*ptr = '\0';
 
-		snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx", 'p', tail, offset);
+		snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx",
+				is_gb ? 'g' : 'p', tail, offset);
 		event = buf;
 		kfree(tail);
 	}
@@ -298,6 +305,8 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto error;
 	}
 
+	tu->is_gb = is_gb;
+
 	/* parse arguments */
 	ret = 0;
 	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
@@ -394,8 +403,12 @@ static int probes_seq_show(struct seq_file *m, void *v)
 {
 	struct trace_uprobe *tu = v;
 	int i;
+	char type = 'p';
+
+	if (tu->is_gb)
+		type = 'g';
 
-	seq_printf(m, "p:%s/%s", tu->call.class->system, tu->call.name);
+	seq_printf(m, "%c:%s/%s", type, tu->call.class->system, tu->call.name);
 	seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);
 
 	for (i = 0; i < tu->nr_args; i++)
@@ -435,6 +448,38 @@ static const struct file_operations uprobe_events_ops = {
 	.write		= probes_write,
 };
 
+static ssize_t uprobes_gp_wakeup_write(struct file *filp,
+		const char __user *ubuf, size_t count, loff_t *ppos)
+{
+	struct task_struct *child;
+	unsigned long pid;
+	int ret;
+
+	ret = kstrtoul_from_user(ubuf, count, 0, &pid);
+	if (ret)
+		return ret;
+
+	rcu_read_lock();
+	child = find_task_by_vpid(pid);
+	if (child)
+		get_task_struct(child);
+	rcu_read_unlock();
+
+	if (!child)
+		return -EINVAL;
+
+	ret = uprobe_wakeup_task(child, 0);
+	put_task_struct(child);
+	return ret ? ret : count;
+}
+
+static const struct file_operations uprobe_gp_wakeup_ops = {
+	.owner		= THIS_MODULE,
+	.open		= simple_open,
+	.llseek		= noop_llseek,
+	.write		= uprobes_gp_wakeup_write,
+};
+
 /* Probes profiling interfaces */
 static int probes_profile_seq_show(struct seq_file *m, void *v)
 {
@@ -704,6 +749,17 @@ int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type,
 	return 0;
 }
 
+static void uprobe_wait_traced(struct trace_uprobe *tu)
+{
+	struct uprobe_task *utask;
+
+	utask = current->utask;
+	utask->state = UTASK_TRACE_SLEEP;
+
+	set_current_state(TASK_TRACED);
+	schedule();
+}
+
 static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
 {
 	struct uprobe_trace_consumer *utc;
@@ -721,6 +777,9 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
 	if (tu->flags & TP_FLAG_PROFILE)
 		uprobe_perf_func(tu, regs);
 #endif
+	if (tu->is_gb)
+		uprobe_wait_traced(tu);
+
 	return 0;
 }
 
@@ -779,6 +838,8 @@ static __init int init_uprobe_trace(void)
 
 	trace_create_file("uprobe_events", 0644, d_tracer,
 				    NULL, &uprobe_events_ops);
+	trace_create_file("uprobe_gb_wakeup", 0644, d_tracer,
+				    NULL, &uprobe_gp_wakeup_ops);
 	/* Profile interface */
 	trace_create_file("uprobe_profile", 0444, d_tracer,
 				    NULL, &uprobe_profile_ops);
-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] uprobes: add global breakpoints
  2012-08-07 16:13 ` [RFC 5/5] uprobes: add global breakpoints Sebastian Andrzej Siewior
@ 2012-08-08 13:19   ` Oleg Nesterov
  2012-08-09 17:18     ` Sebastian Andrzej Siewior
  2012-08-09 18:25   ` Stan Shebs
  2012-08-13 11:35   ` Peter Zijlstra
  2 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2012-08-08 13:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Roland McGrath, Srikar Dronamraju, Ananth N Mavinakaynahalli,
	stan_shebs, gdb-patches

On 08/07, Sebastian Andrzej Siewior wrote:
>
> By setting an uprobe tracepoint, one learns whenever a certain point
> within a program is reached / passed. This is recorded and the
> application continues.
> This patch adds the ability to hold the program once this point has been
> passed and the user may attach to the program via ptrace.
> First, setup a global breakpoint which is very similar to a uprobe trace
> point:
>
> |echo 'g /home/bigeasy/sample:0x0000044d %ip %ax' > uprobe_events
>
> This is exactly what uprobe does except that it starts with the letter
> 'g' instead of 'p'.
>
> Step two is to enable it:
> |echo 1 > events/uprobes/enable
>
> Lets assume you execute ./sample and the breakpoint is hit. In ps you will
> see:
> |1938 pts/1    t+     0:00 ./sample
>
> Now you can attach gdb via 'gdb -p 1938'. The gdb can now interact with
> the tracee and inspect its registers, its stack, single step, let it
> runâ€¦
> In case the process is not of great interest, the user may continue
> without gdb by writting its pid into the uprobe_gp_wakeup file
>
> |echo 1938 > uprobe_gp_wakeup
>
> What I miss right now is an interface to tell the user/gdb that there is a
> program that hit a global breakpoint and is waiting for further instructions.
> A "tail -f trace" does not work and may contain also a lot of other
> informations. I've been thinking about a poll()able file which returns pids of
> tasks which are put on hold. Other suggestions?

Honestly, I am not sure this is that useful...

OK, I'll try to read this patch later. But, at first glance,

> +int uprobe_wakeup_task(struct task_struct *t, int traced)
> +{
> +	struct uprobe_task *utask;
> +
> +	utask = t->utask;
> +	if (!utask)
> +		return -EINVAL;
> +	if (utask->state != UTASK_TRACE_SLEEP)
> +		return -EINVAL;
> +
> +	utask->state = traced ?
> +		UTASK_TRACE_WOKEUP_TRACED : UTASK_TRACE_WOKEUP_NORMAL;
> +	wake_up_state(t, __TASK_TRACED);
> +	return 0;
> +}

This can obviously race with uprobe_wait_traced(), see below

> @@ -286,8 +286,10 @@ static int ptrace_attach(struct task_struct *task, long request,
>  	__ptrace_link(task, current);
>
>  	/* SEIZE doesn't trap tracee on attach */
> -	if (!seize)
> +	if (!seize) {
>  		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
> +		uprobe_wakeup_task(task, 1);
> +	}

Can't understand why uprobe_wakeup_task() depends on !PTRACE_SEIZE

> +static void uprobe_wait_traced(struct trace_uprobe *tu)
> +{
> +	struct uprobe_task *utask;
> +
> +	utask = current->utask;
> +	utask->state = UTASK_TRACE_SLEEP;

WINDOW

> +
> +	set_current_state(TASK_TRACED);
> +	schedule();
> +}

Suppose that uprobe_wakeup_task() is called in the WINDOW above.

OTOH, uprobe_wakeup_task() can race with itself if it is called
twice at the same time, say from uprobes_gp_wakeup_write() and
ptrace_attach().

Oleg.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] uprobes: add global breakpoints
  2012-08-08 13:19   ` Oleg Nesterov
@ 2012-08-09 17:18     ` Sebastian Andrzej Siewior
  2012-08-13 13:20       ` Oleg Nesterov
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-08-09 17:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, x86, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Roland McGrath, Srikar Dronamraju, Ananth N Mavinakaynahalli,
	stan_shebs, gdb-patches

* Oleg Nesterov | 2012-08-08 15:14:57 [+0200]:

>> What I miss right now is an interface to tell the user/gdb that there is a
>> program that hit a global breakpoint and is waiting for further instructions.
>> A "tail -f trace" does not work and may contain also a lot of other
>> informations. I've been thinking about a poll()able file which returns pids of
>> tasks which are put on hold. Other suggestions?
>
>Honestly, I am not sure this is that useful...

How would you notify gdb that there is a new task that hit a breakpoint?
Or learn yourself?

>OK, I'll try to read this patch later. But, at first glance,
Thank you.

>> @@ -286,8 +286,10 @@ static int ptrace_attach(struct task_struct *task, long request,
>>  	__ptrace_link(task, current);
>>
>>  	/* SEIZE doesn't trap tracee on attach */
>> -	if (!seize)
>> +	if (!seize) {
>>  		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
>> +		uprobe_wakeup_task(task, 1);
>> +	}
>
>Can't understand why uprobe_wakeup_task() depends on !PTRACE_SEIZE

because in the SEIZE case the task isn't halted, it continues to run. Or
do you want to use PTRACE_SEIZE for tasks which hit the global
breakpoint and you have no interrest in them and want them to continue
like nothing happend?

>> +
>> +	set_current_state(TASK_TRACED);
>> +	schedule();
>> +}
>
>Suppose that uprobe_wakeup_task() is called in the WINDOW above.
>
>OTOH, uprobe_wakeup_task() can race with itself if it is called
>twice at the same time, say from uprobes_gp_wakeup_write() and
>ptrace_attach().
Okay, I'm going to close the window.

>
>Oleg.

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] uprobes: add global breakpoints
  2012-08-07 16:13 ` [RFC 5/5] uprobes: add global breakpoints Sebastian Andrzej Siewior
  2012-08-08 13:19   ` Oleg Nesterov
@ 2012-08-09 18:25   ` Stan Shebs
  2012-08-13 11:35   ` Peter Zijlstra
  2 siblings, 0 replies; 13+ messages in thread
From: Stan Shebs @ 2012-08-09 18:25 UTC (permalink / raw)
  To: gdb-patches

On 8/7/12 9:12 AM, Sebastian Andrzej Siewior wrote:
> What I miss right now is an interface to tell the user/gdb that there is a
> program that hit a global breakpoint and is waiting for further instructions.
> A "tail -f trace" does not work and may contain also a lot of other
> informations. I've been thinking about a poll()able file which returns pids of
> tasks which are put on hold. Other suggestions?
>

In case anybody is having a mysterious sense of deja vu about this, my 
patch last year for global breakpoints ( 
http://sourceware.org/ml/gdb-patches/2011-06/msg00163.html and friends) 
had the same issue, and solved this by opening a device /dev/breakpoint 
and hooking it up to a GDB event handler.  (GDB event handling 
ultimately calls poll() on the collection of event sources.)

Stan
stan@codesourcery.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] uprobes: add global breakpoints
  2012-08-07 16:13 ` [RFC 5/5] uprobes: add global breakpoints Sebastian Andrzej Siewior
  2012-08-08 13:19   ` Oleg Nesterov
  2012-08-09 18:25   ` Stan Shebs
@ 2012-08-13 11:35   ` Peter Zijlstra
  2012-08-20 15:27     ` Sebastian Andrzej Siewior
  2012-08-21 19:42     ` [RFC 5/5 v2] " Sebastian Andrzej Siewior
  2 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2012-08-13 11:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Arnaldo Carvalho de Melo, Roland McGrath,
	Oleg Nesterov, Srikar Dronamraju, Ananth N Mavinakaynahalli,
	stan_shebs, gdb-patches

On Tue, 2012-08-07 at 18:12 +0200, Sebastian Andrzej Siewior wrote:
> By setting an uprobe tracepoint, one learns whenever a certain point
> within a program is reached / passed. This is recorded and the
> application continues.
> This patch adds the ability to hold the program once this point has been
> passed and the user may attach to the program via ptrace.
> First, setup a global breakpoint which is very similar to a uprobe trace
> point:
> 
> |echo 'g /home/bigeasy/sample:0x0000044d %ip %ax' > uprobe_events
> 
> This is exactly what uprobe does except that it starts with the letter
> 'g' instead of 'p'.
> 
> Step two is to enable it:
> |echo 1 > events/uprobes/enable
> 
> Lets assume you execute ./sample and the breakpoint is hit. In ps you will
> see:
> |1938 pts/1    t+     0:00 ./sample

This seems particularly dangerous.. suppose you tag a frequent function
(say malloc) and the entire userspace freezes, including your shell.

> Now you can attach gdb via 'gdb -p 1938'. The gdb can now interact with
> the tracee and inspect its registers, its stack, single step, let it
> run…
> In case the process is not of great interest, the user may continue
> without gdb by writting its pid into the uprobe_gp_wakeup file
> 
> |echo 1938 > uprobe_gp_wakeup
> 
> What I miss right now is an interface to tell the user/gdb that there is a
> program that hit a global breakpoint and is waiting for further instructions.
> A "tail -f trace" does not work and may contain also a lot of other
> informations. I've been thinking about a poll()able file which returns pids of
> tasks which are put on hold. Other suggestions? 

I'm not really happy with any of this. I would suggest limiting this
stuff much further, like say only have it affect ptraced
processes/tasks. That way you cannot accidentally freeze the entire
system into oblivion.

GDB (and assorted stuff) can already track an entire process hierarchy
with fork follow stuffs.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] uprobes: add global breakpoints
  2012-08-09 17:18     ` Sebastian Andrzej Siewior
@ 2012-08-13 13:20       ` Oleg Nesterov
  2012-08-14 11:44         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2012-08-13 13:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Roland McGrath, Srikar Dronamraju, Ananth N Mavinakaynahalli,
	stan_shebs, gdb-patches

On 08/09, Sebastian Andrzej Siewior wrote:
>
> * Oleg Nesterov | 2012-08-08 15:14:57 [+0200]:
>
> >> What I miss right now is an interface to tell the user/gdb that there is a
> >> program that hit a global breakpoint and is waiting for further instructions.
> >> A "tail -f trace" does not work and may contain also a lot of other
> >> informations. I've been thinking about a poll()able file which returns pids of
> >> tasks which are put on hold. Other suggestions?
> >
> >Honestly, I am not sure this is that useful...
>
> How would you notify gdb that there is a new task that hit a breakpoint?
> Or learn yourself?

But why do we need this?

OK, you do not need to convince me, I try to never argue with
new features.

However, I certainly dislike TASK_TRACED in uprobe_wait_traced().
And sleeping in ->handler() is not fair to other consumers.

And I do not think you should modify ptrace_attach() at all.
gdb/user can wakeup the task after PTRACE_ATTACH itself.

Oleg.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] uprobes: add global breakpoints
  2012-08-13 13:20       ` Oleg Nesterov
@ 2012-08-14 11:44         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-08-14 11:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, x86, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Roland McGrath, Srikar Dronamraju, Ananth N Mavinakaynahalli,
	stan_shebs, gdb-patches

On 08/13/2012 03:16 PM, Oleg Nesterov wrote:
> On 08/09, Sebastian Andrzej Siewior wrote:
>>
>> * Oleg Nesterov | 2012-08-08 15:14:57 [+0200]:
>>
>>>> What I miss right now is an interface to tell the user/gdb that there is a
>>>> program that hit a global breakpoint and is waiting for further instructions.
>>>> A "tail -f trace" does not work and may contain also a lot of other
>>>> informations. I've been thinking about a poll()able file which returns pids of
>>>> tasks which are put on hold. Other suggestions?
>>>
>>> Honestly, I am not sure this is that useful...
>>
>> How would you notify gdb that there is a new task that hit a breakpoint?
>> Or learn yourself?
>
> But why do we need this?

Shouldn't we learn somehow that a process hits a breakpoint? The task
was not yet monitored by gdb.

> OK, you do not need to convince me, I try to never argue with
> new features.

If there is a simple mechanism, I would switch to it. Right now I think
about using this "notification mechanism" to auto-exlude the listener
(and its parents) from the list of possible targets. So I don't freeze
the whole system while I have a breakpoint at malloc() in libc.

> However, I certainly dislike TASK_TRACED in uprobe_wait_traced().
> And sleeping in ->handler() is not fair to other consumers.

I added it as the last task in current consumer. I could move it out of
the consumer loop and freeze it after all consumer are handled but then
I lose the filter member (which is currently NULL, I know).

> And I do not think you should modify ptrace_attach() at all.
> gdb/user can wakeup the task after PTRACE_ATTACH itself.

I see. gdb / strace --pid $num" gdb does PTRACE_ATTACH and waits
afterwords in wait() indefinitely for the SIGSTOP which is blocked
since the process is already in TASK_TRACED. This is nice since the
signals are blocked and are delivered once the task is unfrozed.

> Oleg.

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] uprobes: add global breakpoints
  2012-08-13 11:35   ` Peter Zijlstra
@ 2012-08-20 15:27     ` Sebastian Andrzej Siewior
  2012-08-21 19:42     ` [RFC 5/5 v2] " Sebastian Andrzej Siewior
  1 sibling, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-08-20 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, x86, Arnaldo Carvalho de Melo, Roland McGrath,
	Oleg Nesterov, Srikar Dronamraju, Ananth N Mavinakaynahalli,
	stan_shebs, gdb-patches

On 08/13/2012 01:34 PM, Peter Zijlstra wrote:
> I'm not really happy with any of this. I would suggest limiting this
> stuff much further, like say only have it affect ptraced
> processes/tasks. That way you cannot accidentally freeze the entire
> system into oblivion.

I'be been browsing over the cgroup Documentation and it seems to look
usefull. What I have in mind is the following:
/sys/fs/cgroup/gb

The root group is the default one where a breakpoint triggers. Below
you could have two groups:
- excluded
- active

The excluded group would never trigger a breakpoint.
Once a task in the root set triggers a breakpoint it will be moved into
to the active set. The eventfd notifcation API of cgroups could be used
to learn about this change.

The whole concept fails if the user does not move a single task into
the excluded group. To be overprotective here, I could try not do
anything until we have at least one pid in the "excluded" set.

So far I like this, it could be heat though.

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 5/5 v2] uprobes: add global breakpoints
  2012-08-13 11:35   ` Peter Zijlstra
  2012-08-20 15:27     ` Sebastian Andrzej Siewior
@ 2012-08-21 19:42     ` Sebastian Andrzej Siewior
  2012-08-22 13:53       ` Oleg Nesterov
  1 sibling, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-08-21 19:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, x86, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Srikar Dronamraju, Ananth N Mavinakaynahalli, stan_shebs,
	gdb-patches

By setting an uprobe tracepoint, one learns whenever a certain point
within a program is reached / passed. This is recorded and the
application continues.
This patch adds the ability to hold the program once this point has been
passed and the user may attach to the program via ptrace.
First, setup a global breakpoint which is very similar to a uprobe trace
point:

|echo 'g /home/bigeasy/uprobetest/sample:0x0000044d %ip %ax %bx' > uprobe_events

This is exactly what uprobe does except that it starts with the letter
'g' instead of 'p'.

Step two is to enable it:
|echo 1 > events/uprobes/enable

Step three is to add pids of prcocess which are excluded from global
breakpoints even if the process would hit one. This should ensure that
the debugger remains active and the global breakpoint on system libc's
malloc() does not freeze the system. A pid can be excluded by
| echo e $pid > uprobe_gb_exclude

You need atleast one pid in the exlude list. An entry can be removed by
| echo a $pid > uprobe_gb_exclude

Lets assume you execute ./sample and the breakpoint is hit. In ps you will
see:
|1938 pts/1    t+     0:00 ./sample

Now you can attach gdb via 'gdb -p 1938'. The gdb now can interact with
the tracee and inspect its registers or its stack, single step, let it
runâ€Š
In case the process is not of great interest, the user may continue
without gdb by writting its pid into the uprobe_gp_wakeup file

|echo 1938 > uprobe_gp_wakeup

Cc: gdb-patches@sourceware.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1..v2: 
 - closed the window between set state / check state
 - tried to address Peters review / concern:
   - added "uprobe_gb_exclude". This file contains a list of pids which
     are excluded from the "global breakpoint" behavior. The idea is to
     whitelist programs which are essential and must not hit a
     breakpoint. An empty list is invalid and _no_ global breakpoint will
     hit.
   - added "uprobe_gb_active". This file contains a list of pids which
     hit the global breakpoint. The user can poll() here and wait for
     the next victim. The size of the list limited. This is step two to
     ensure a global system lock up does not occur. If a java program is
     beeing debugged and the size of the list is too small then the list
     could be allocated at runtime with more entries.

   I've been thinking about alterntives to the approach above:
   - cgroups
     Would solve some problems. It would be very easy for the user to
     group tasks in two groups: "root" group with "allowed" tasks and
     sub group "excluded" for tasks which are excluded from the global
     breakpoint(s). A third group would be required to put the "halted"
     tasks. I would need one file to set the type of the group (root is
     easy, "allowed" and "halted" have to be set). The notification
     mechanism works on per file basis. So I would have to add file with
     no content just to let the user that the task file has new entries.
     All in all this looks like a abuse of cgroups just to follow forks
     on the exclude list and maintain the list.
   - auto exclude the read()er / poll()er of uprobe_gb_active
     This sounds lovely but has two short commings:
     - the pid of the process that opened it may change after fork()
       since the initial owner may exit
     - they may be two+ childs after fork() which read() / poll(). Both
       should be excluded since I don't kwnow which one is which. I don't
       know which one terminates because ->release() is called by last
       process that closes the fd. That means in this scenario I would
       add more entries to the while list than remove.
     - having a list of tasks which currently poll() the file would
       solve the problem with this endless growing list. However once
       poll() is done (one process just hit the global breakpoint) I have
       an empty list since no one can poll() now. That means that I
       would exclude every further process which hits the global breakpoint
       before someone poll()s again.

Oleg: The change in ptrace_attach() is still as it was. I tried to
address Peter concern here.
Now what options do I have here:
- not putting the task in TASK_TRACED but simply halt. This would work
  without a change to ptrace_attach() but the task continues on any
  signal. So a signal friendly task would continue and not notice a
  thing.
- putting the TASK_TRACED and not touching ptrace_attach(). Each
  ptrace() user would have to kick the task itself which means changes
  to gdb / strace. If this is the prefered way then I guess it can be
  done :)

 include/linux/uprobes.h     |   10 ++
 kernel/events/uprobes.c     |   13 +-
 kernel/ptrace.c             |    4 +-
 kernel/trace/trace_uprobe.c |  414 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 435 insertions(+), 6 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0fc6585..991a665 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -63,6 +63,9 @@ enum uprobe_task_state {
 	UTASK_SSTEP,
 	UTASK_SSTEP_ACK,
 	UTASK_SSTEP_TRAPPED,
+	UTASK_TRACE_SLEEP,
+	UTASK_TRACE_WOKEUP_NORMAL,
+	UTASK_TRACE_WOKEUP_TRACED,
 };
 
 /*
@@ -76,6 +79,7 @@ struct uprobe_task {
 
 	unsigned long			xol_vaddr;
 	unsigned long			vaddr;
+	int				skip_handler;
 };
 
 /*
@@ -120,6 +124,8 @@ extern bool uprobe_deny_signal(void);
 extern bool __weak arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void uprobe_clear_state(struct mm_struct *mm);
 extern void uprobe_reset_state(struct mm_struct *mm);
+extern int uprobe_wakeup_task(struct task_struct *t, int traced);
+
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
@@ -163,5 +169,9 @@ static inline void uprobe_clear_state(struct mm_struct *mm)
 static inline void uprobe_reset_state(struct mm_struct *mm)
 {
 }
+static inline int uprobe_wakeup_task(struct task_struct *t, int traced)
+{
+	return 0;
+}
 #endif /* !CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c8e5204..c140e03 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1513,7 +1513,16 @@ static void handle_swbp(struct pt_regs *regs)
 			goto cleanup_ret;
 	}
 	utask->active_uprobe = uprobe;
-	handler_chain(uprobe, regs);
+	if (utask->skip_handler)
+		utask->skip_handler = 0;
+	else
+		handler_chain(uprobe, regs);
+
+	if (utask->state == UTASK_TRACE_WOKEUP_TRACED) {
+		send_sig(SIGTRAP, current, 0);
+		utask->skip_handler = 1;
+		goto cleanup_ret;
+	}
 	if (uprobe->flags & UPROBE_SKIP_SSTEP && can_skip_sstep(uprobe, regs))
 		goto cleanup_ret;
 
@@ -1528,7 +1537,7 @@ cleanup_ret:
 		utask->active_uprobe = NULL;
 		utask->state = UTASK_RUNNING;
 	}
-	if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
+	if (!(uprobe->flags & UPROBE_SKIP_SSTEP) || utask->skip_handler)
 
 		/*
 		 * cannot singlestep; cannot skip instruction;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a232bb5..5d6d3ed 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -286,8 +286,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	__ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
-	if (!seize)
+	if (!seize) {
 		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+		uprobe_wakeup_task(task, 1);
+	}
 
 	spin_lock(&task->sighand->siglock);
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index f3c3811..693c50a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -22,6 +22,8 @@
 #include <linux/uaccess.h>
 #include <linux/uprobes.h>
 #include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/sort.h>
 
 #include "trace_probe.h"
 
@@ -48,6 +50,7 @@ struct trace_uprobe {
 	unsigned int			flags;	/* For TP_FLAG_* */
 	ssize_t				size;	/* trace entry size */
 	unsigned int			nr_args;
+	bool				is_gb;
 	struct probe_arg		args[];
 };
 
@@ -177,19 +180,24 @@ static int create_trace_uprobe(int argc, char **argv)
 	struct path path;
 	unsigned long offset;
 	bool is_delete;
+	bool is_gb;
 	int i, ret;
 
 	inode = NULL;
 	ret = 0;
 	is_delete = false;
+	is_gb = false;
 	event = NULL;
 	group = NULL;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
 		is_delete = true;
+	else if (argv[0][0] == 'g')
+		is_gb = true;
 	else if (argv[0][0] != 'p') {
-		pr_info("Probe definition must be started with 'p' or '-'.\n");
+		pr_info("Probe definition must be started with 'p', 'g' or "
+				"'-'.\n");
 		return -EINVAL;
 	}
 
@@ -277,7 +285,8 @@ static int create_trace_uprobe(int argc, char **argv)
 		if (ptr)
 			*ptr = '\0';
 
-		snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx", 'p', tail, offset);
+		snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx",
+				is_gb ? 'g' : 'p', tail, offset);
 		event = buf;
 		kfree(tail);
 	}
@@ -298,6 +307,8 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto error;
 	}
 
+	tu->is_gb = is_gb;
+
 	/* parse arguments */
 	ret = 0;
 	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
@@ -394,8 +405,12 @@ static int probes_seq_show(struct seq_file *m, void *v)
 {
 	struct trace_uprobe *tu = v;
 	int i;
+	char type = 'p';
+
+	if (tu->is_gb)
+		type = 'g';
 
-	seq_printf(m, "p:%s/%s", tu->call.class->system, tu->call.name);
+	seq_printf(m, "%c:%s/%s", type, tu->call.class->system, tu->call.name);
 	seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);
 
 	for (i = 0; i < tu->nr_args; i++)
@@ -435,6 +450,366 @@ static const struct file_operations uprobe_events_ops = {
 	.write		= probes_write,
 };
 
+static int pidt_cmp(const void *a, const void *b)
+{
+	const pid_t *ap = a;
+	const pid_t *bp = b;
+
+	if (*ap != *bp)
+		return *ap > *bp ? 1 : -1;
+	return 0;
+}
+
+static pid_t* gb_pid_find(pid_t *first, pid_t *last, pid_t pid)
+{
+	while (first <= last) {
+		pid_t *mid;
+
+		mid = ((last - first) >> 1) + first;
+
+		if (*mid < pid)
+			first = mid + 1;
+		else if (*mid > pid)
+			last = mid - 1;
+		else
+			return mid;
+	}
+	return NULL;
+}
+
+static loff_t gb_read_reset(struct file *file, loff_t offset, int origin)
+{
+	if (offset != 0)
+		return -EINVAL;
+	if (origin != SEEK_SET)
+		return -EINVAL;
+	file->f_pos = 0;
+	return file->f_pos;
+}
+
+static DEFINE_MUTEX(gb_pid_lock);
+static DEFINE_MUTEX(gb_state_lock);
+
+static ssize_t gb_read(char __user *buffer, size_t count, loff_t *ppos,
+		pid_t *pids, u8 num_pids)
+{
+	char buf[800];
+	int left;
+	size_t wrote = 0;
+	int i;
+	int ret;
+
+	if (*ppos)
+		return 0;
+
+	left = min(sizeof(buf), count);
+
+	mutex_lock(&gb_pid_lock);
+	for (i = 0; (i < num_pids) && left - wrote > 0; i++) {
+		wrote += snprintf(&buf[wrote], left - wrote, "%d\n",
+				pids[i]);
+	}
+	mutex_unlock(&gb_pid_lock);
+
+	wrote = min(wrote, count);
+	ret = copy_to_user(buffer, buf, wrote);
+	if (ret)
+		return -EFAULT;
+	*ppos = 1;
+	return wrote;
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(gb_hit_ev_queue);
+static pid_t active_pids[64];
+static u8 num_active_pids;
+
+static int uprobe_gb_record(void)
+{
+	mutex_lock(&gb_pid_lock);
+	if (WARN_ON_ONCE(num_active_pids > ARRAY_SIZE(active_pids))) {
+		mutex_unlock(&gb_pid_lock);
+		return -ENOSPC;
+	}
+
+	active_pids[num_active_pids] = current->pid;
+	num_active_pids++;
+
+	sort(active_pids, num_active_pids, sizeof(pid_t),
+			pidt_cmp, NULL);
+	mutex_unlock(&gb_pid_lock);
+
+	wake_up_interruptible(&gb_hit_ev_queue);
+	return 0;
+}
+
+static pid_t* gb_active_find(pid_t pid)
+{
+	return gb_pid_find(&active_pids[0],
+			&active_pids[num_active_pids], pid);
+}
+
+static int uprobe_gb_remove_active(pid_t pid)
+{
+	pid_t *match;
+	u8 entry;
+
+	mutex_lock(&gb_pid_lock);
+	match = gb_active_find(pid);
+	if (!match) {
+		mutex_unlock(&gb_pid_lock);
+		return -EINVAL;
+	}
+
+	num_active_pids--;
+	entry = match - active_pids;
+	memcpy(&active_pids[entry], &active_pids[entry + 1],
+			(num_active_pids - entry) * sizeof(pid_t));
+	mutex_unlock(&gb_pid_lock);
+	return 0;
+}
+
+static unsigned int gb_poll(struct file *file, struct poll_table_struct *wait)
+{
+	poll_wait(file, &gb_hit_ev_queue, wait);
+	if (num_active_pids)
+		return POLLIN | POLLRDNORM;
+	return 0;
+}
+
+static ssize_t gb_active_read(struct file *file, char __user * buffer, size_t count,
+		loff_t *ppos)
+{
+	int ret;
+	ret = gb_read(buffer, count, ppos, active_pids, num_active_pids);
+	return ret;
+}
+
+int uprobe_wakeup_task(struct task_struct *t, int traced)
+{
+	struct uprobe_task *utask;
+	int ret = -EINVAL;
+
+	utask = t->utask;
+	if (!utask)
+		return ret;
+	mutex_lock(&gb_state_lock);
+	if (utask->state != UTASK_TRACE_SLEEP)
+		goto out;
+
+	uprobe_gb_remove_active(t->pid);
+
+	utask->state = traced ?
+		UTASK_TRACE_WOKEUP_TRACED : UTASK_TRACE_WOKEUP_NORMAL;
+	wake_up_state(t, __TASK_TRACED);
+	ret = 0;
+out:
+	mutex_unlock(&gb_state_lock);
+	return ret;
+}
+
+static int gp_continue_pid(const char *buf)
+{
+	struct task_struct *child;
+	unsigned long pid;
+	int ret;
+
+	if (isspace(*buf))
+		buf++;
+
+	ret = kstrtoul(buf, 0, &pid);
+	if (ret)
+		return ret;
+
+	rcu_read_lock();
+	child = find_task_by_vpid(pid);
+	if (child)
+		get_task_struct(child);
+	rcu_read_unlock();
+
+	if (!child)
+		return -EINVAL;
+
+	ret = uprobe_wakeup_task(child, 0);
+	put_task_struct(child);
+	return ret;
+}
+
+static ssize_t gp_active_write(struct file *filp,
+		const char __user *ubuf, size_t count, loff_t *ppos)
+{
+	char buf[32];
+	int ret;
+
+	if (count >= sizeof(buf))
+		return -ERANGE;
+	ret = copy_from_user(buf, ubuf, count);
+	if (ret)
+		return -EFAULT;
+	buf[count] = '\0';
+
+	switch (buf[0]) {
+		case 'c':
+			ret = gp_continue_pid(&buf[1]);
+			break;
+
+		default:
+			ret = -EINVAL;
+	};
+
+	if (ret < 0)
+		return ret;
+	return count;
+}
+
+static const struct file_operations uprobe_gp_active_ops = {
+	.owner		= THIS_MODULE,
+	.open		= simple_open,
+	.llseek		= gb_read_reset,
+	.read		= gb_active_read,
+	.write		= gp_active_write,
+	.poll		= gb_poll,
+};
+
+static pid_t exlcuded_pids[64];
+static u8 num_exlcuded_pids;
+
+static pid_t* gb_exclude_find(pid_t pid)
+{
+	return gb_pid_find(&exlcuded_pids[0],
+			&exlcuded_pids[num_exlcuded_pids], pid);
+}
+
+static int uprobe_gb_allowed(void)
+{
+	pid_t *match;
+
+	if (!num_exlcuded_pids) {
+		pr_err_once("Need atleast one PID which is excluded from the "
+				"global breakpoint. This should be the "
+				"debugging tool.\n");
+		return -EINVAL;
+	}
+	mutex_lock(&gb_pid_lock);
+	match = gb_exclude_find(current->pid);
+	mutex_unlock(&gb_pid_lock);
+	if (match)
+		return -EPERM;
+	return 0;
+}
+
+static int gp_exclude_pid(const char *buf)
+{
+	unsigned long pid;
+	pid_t *match;
+	int ret;
+
+	if (isspace(*buf))
+		buf++;
+	ret = kstrtoul(buf, 0, &pid);
+	if (ret)
+		return ret;
+
+	mutex_lock(&gb_pid_lock);
+	if (num_exlcuded_pids >= ARRAY_SIZE(exlcuded_pids)) {
+		ret = -E2BIG;
+		goto out;
+	}
+
+	match = gb_exclude_find(pid);
+	if (match) {
+		ret = 0;
+		goto out;
+	}
+
+	exlcuded_pids[num_exlcuded_pids] = pid;
+	num_exlcuded_pids++;
+
+	sort(exlcuded_pids, num_exlcuded_pids, sizeof(pid_t),
+			pidt_cmp, NULL);
+out:
+	mutex_unlock(&gb_pid_lock);
+	return ret;
+}
+
+static int gp_allow_pid(const char *buf)
+{
+	unsigned long pid;
+	pid_t *match;
+	u8 entry;
+	int ret;
+
+	if (isspace(*buf))
+		buf++;
+
+	ret = kstrtoul(buf, 0, &pid);
+	if (ret)
+		return ret;
+
+	mutex_lock(&gb_pid_lock);
+	match = gb_exclude_find(pid);
+	if (!match) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	num_exlcuded_pids--;
+	entry = match - exlcuded_pids;
+	memcpy(&exlcuded_pids[entry], &exlcuded_pids[entry + 1],
+			(num_exlcuded_pids - entry) * sizeof(pid_t));
+	ret = 0;
+out:
+	mutex_unlock(&gb_pid_lock);
+	return ret;
+}
+
+static ssize_t gp_exclude_write(struct file *filp,
+		const char __user *ubuf, size_t count, loff_t *ppos)
+{
+	char buf[32];
+	int ret;
+
+	if (count >= sizeof(buf))
+		return -ERANGE;
+	ret = copy_from_user(buf, ubuf, count);
+	if (ret)
+		return -EFAULT;
+	buf[count] = '\0';
+
+	switch (buf[0]) {
+	case 'e':
+		ret = gp_exclude_pid(&buf[1]);
+		break;
+
+	case 'a':
+		ret = gp_allow_pid(&buf[1]);
+		break;
+
+	default:
+		ret = -EINVAL;
+	};
+
+	if (ret < 0)
+		return ret;
+	return count;
+}
+
+static ssize_t gb_exclude_read(struct file *file, char __user *buffer,
+		size_t count, loff_t *ppos)
+{
+	int ret;
+
+	ret = gb_read(buffer, count, ppos, exlcuded_pids, num_exlcuded_pids);
+	return ret;
+}
+
+static const struct file_operations uprobe_gp_exclude_ops = {
+	.owner		= THIS_MODULE,
+	.open		= simple_open,
+	.llseek		= gb_read_reset,
+	.read		= gb_exclude_read,
+	.write		= gp_exclude_write,
+};
+
 /* Probes profiling interfaces */
 static int probes_profile_seq_show(struct seq_file *m, void *v)
 {
@@ -704,6 +1079,32 @@ int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type,
 	return 0;
 }
 
+static void uprobe_wait_traced(struct trace_uprobe *tu)
+{
+	struct uprobe_task *utask;
+	int ret;
+
+	ret = uprobe_gb_allowed();
+	if (ret)
+		return;
+
+	mutex_lock(&gb_state_lock);
+	utask = current->utask;
+	utask->state = UTASK_TRACE_SLEEP;
+
+	set_current_state(TASK_TRACED);
+	ret = uprobe_gb_record();
+	if (ret < 0) {
+		utask->state = UTASK_TRACE_WOKEUP_NORMAL;
+		set_current_state(TASK_RUNNING);
+		mutex_unlock(&gb_state_lock);
+		return;
+	}
+	mutex_unlock(&gb_state_lock);
+
+	schedule();
+}
+
 static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
 {
 	struct uprobe_trace_consumer *utc;
@@ -721,6 +1122,9 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
 	if (tu->flags & TP_FLAG_PROFILE)
 		uprobe_perf_func(tu, regs);
 #endif
+	if (tu->is_gb)
+		uprobe_wait_traced(tu);
+
 	return 0;
 }
 
@@ -779,6 +1183,10 @@ static __init int init_uprobe_trace(void)
 
 	trace_create_file("uprobe_events", 0644, d_tracer,
 				    NULL, &uprobe_events_ops);
+	trace_create_file("uprobe_gb_exclude", 0644, d_tracer,
+				    NULL, &uprobe_gp_exclude_ops);
+	trace_create_file("uprobe_gb_active", 0644, d_tracer,
+				    NULL, &uprobe_gp_active_ops);
 	/* Profile interface */
 	trace_create_file("uprobe_profile", 0444, d_tracer,
 				    NULL, &uprobe_profile_ops);
-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5 v2] uprobes: add global breakpoints
  2012-08-21 19:42     ` [RFC 5/5 v2] " Sebastian Andrzej Siewior
@ 2012-08-22 13:53       ` Oleg Nesterov
  2012-08-27 18:57         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2012-08-22 13:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, linux-kernel, x86, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Ananth N Mavinakaynahalli, stan_shebs,
	gdb-patches

On 08/21, Sebastian Andrzej Siewior wrote:
>
> This patch adds the ability to hold the program once this point has been
> passed and the user may attach to the program via ptrace.

Sorry Sebastian, I didn't even try to read the patch ;) Fortunately I am
not maintainer, I can only reapeat that you do not need to convince me.

> Oleg: The change in ptrace_attach() is still as it was. I tried to
> address Peter concern here.
> Now what options do I have here:
> - not putting the task in TASK_TRACED but simply halt. This would work
>   without a change to ptrace_attach() but the task continues on any
>   signal. So a signal friendly task would continue and not notice a
>   thing.

TASK_KILLABLE

> - putting the TASK_TRACED

This is simply wrong, in many ways.

For example, what if the probed task is already ptraced? Or debugger
attaches via PTRACE_SEIZE? How can debugger know it is stopped?
uprobe_wait_traced() goes to sleep in TASK_TRACED without notification.
And it does not set ->exit_code, this means do_wait() won't work.
And note ptrace_stop()->recalc_sigpending_tsk().

> @@ -76,6 +79,7 @@ struct uprobe_task {
>
>  	unsigned long			xol_vaddr;
>  	unsigned long			vaddr;
> +	int				skip_handler;

I am trying to guess what this skip_handler does...

> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1513,7 +1513,16 @@ static void handle_swbp(struct pt_regs *regs)
>  			goto cleanup_ret;
>  	}
>  	utask->active_uprobe = uprobe;
> -	handler_chain(uprobe, regs);
> +	if (utask->skip_handler)
> +		utask->skip_handler = 0;
> +	else
> +		handler_chain(uprobe, regs);
> +
> +	if (utask->state == UTASK_TRACE_WOKEUP_TRACED) {
> +		send_sig(SIGTRAP, current, 0);
> +		utask->skip_handler = 1;
> +		goto cleanup_ret;
> +	}
>  	if (uprobe->flags & UPROBE_SKIP_SSTEP && can_skip_sstep(uprobe, regs))
>  		goto cleanup_ret;
>  
> @@ -1528,7 +1537,7 @@ cleanup_ret:
>  		utask->active_uprobe = NULL;
>  		utask->state = UTASK_RUNNING;
>  	}
> -	if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
> +	if (!(uprobe->flags & UPROBE_SKIP_SSTEP) || utask->skip_handler)

Am I understand correctly?

If it was woken by PTRACE_ATTACH we set utask->skip_handler = 1 and
re-execute the instruction (yes, SIGTRAP, but this doesn't matter).
When the task hits this bp again we skip handler_chain() because it
was already reported.

Yes? If yes, I don't think this can work. Suppose that the task
dequeues a signal before it returns to the usermode to re-execute
and enters the signal handler which can hit another uprobe.

And this can race with uprobe_register() afaics.

Oleg.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5 v2] uprobes: add global breakpoints
  2012-08-22 13:53       ` Oleg Nesterov
@ 2012-08-27 18:57         ` Sebastian Andrzej Siewior
  2012-08-29 15:48           ` Oleg Nesterov
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-08-27 18:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, linux-kernel, x86, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Ananth N Mavinakaynahalli, stan_shebs,
	gdb-patches

On 08/22/2012 03:48 PM, Oleg Nesterov wrote:
> On 08/21, Sebastian Andrzej Siewior wrote:
>>
>> This patch adds the ability to hold the program once this point has been
>> passed and the user may attach to the program via ptrace.
>
> Sorry Sebastian, I didn't even try to read the patch ;) Fortunately I am
> not maintainer, I can only reapeat that you do not need to convince me.

At least for the ptrace part I would prefer to have your blessing
instead something that seems to work but is wrong.

>> Oleg: The change in ptrace_attach() is still as it was. I tried to
>> address Peter concern here.
>> Now what options do I have here:
>> - not putting the task in TASK_TRACED but simply halt. This would work
>>    without a change to ptrace_attach() but the task continues on any
>>    signal. So a signal friendly task would continue and not notice a
>>    thing.
>
> TASK_KILLABLE

That would help but would require a change in ptrace_attach() or
something in gdb/strace/Â…

One thing I just noticed: If I don't register a handler for SIGUSR1 and
send one to the application while it is in TASK_KILLABLE then the
signal gets delivered. If I register a signal handler for it than it
gets blocked and delivered once I resume the task.
Shouldn't it get blocked even if I don't register a handler for it?

>> - putting the TASK_TRACED
>
> This is simply wrong, in many ways.
>
> For example, what if the probed task is already ptraced? Or debugger
> attaches via PTRACE_SEIZE? How can debugger know it is stopped?
> uprobe_wait_traced() goes to sleep in TASK_TRACED without notification.
> And it does not set ->exit_code, this means do_wait() won't work.
> And note ptrace_stop()->recalc_sigpending_tsk().

Okay, okay. It looks like it is better to stick with TASK_KILLABLE
instead of fixing the issues you pointed out.

>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -1513,7 +1513,16 @@ static void handle_swbp(struct pt_regs *regs)
>>   			goto cleanup_ret;
>>   	}
>>   	utask->active_uprobe = uprobe;
>> -	handler_chain(uprobe, regs);
>> +	if (utask->skip_handler)
>> +		utask->skip_handler = 0;
>> +	else
>> +		handler_chain(uprobe, regs);
>> +
>> +	if (utask->state == UTASK_TRACE_WOKEUP_TRACED) {
>> +		send_sig(SIGTRAP, current, 0);
>> +		utask->skip_handler = 1;
>> +		goto cleanup_ret;
>> +	}
>>   	if (uprobe->flags&  UPROBE_SKIP_SSTEP&&  can_skip_sstep(uprobe, regs))
>>   		goto cleanup_ret;
>>
>> @@ -1528,7 +1537,7 @@ cleanup_ret:
>>   		utask->active_uprobe = NULL;
>>   		utask->state = UTASK_RUNNING;
>>   	}
>> -	if (!(uprobe->flags&  UPROBE_SKIP_SSTEP))
>> +	if (!(uprobe->flags&  UPROBE_SKIP_SSTEP) || utask->skip_handler)
>
> Am I understand correctly?
>
> If it was woken by PTRACE_ATTACH we set utask->skip_handler = 1 and
> re-execute the instruction (yes, SIGTRAP, but this doesn't matter).
> When the task hits this bp again we skip handler_chain() because it
> was already reported.
>
> Yes? If yes, I don't think this can work. Suppose that the task
> dequeues a signal before it returns to the usermode to re-execute
> and enters the signal handler which can hit another uprobe.

ach, those signals make everything complicated. I though signals are
blocked until the single step is done but my test just showed my
something different. Okay, what now? A simple nested struct uprobe_task
and struct uprobe? Blocking signals isn't probably a good idea.

> And this can race with uprobe_register() afaics.

> Oleg.

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5 v2] uprobes: add global breakpoints
  2012-08-27 18:57         ` Sebastian Andrzej Siewior
@ 2012-08-29 15:48           ` Oleg Nesterov
  2012-08-30 20:42             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2012-08-29 15:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, linux-kernel, x86, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Ananth N Mavinakaynahalli, stan_shebs,
	gdb-patches

On 08/27, Sebastian Andrzej Siewior wrote:
>
> On 08/22/2012 03:48 PM, Oleg Nesterov wrote:
>> On 08/21, Sebastian Andrzej Siewior wrote:
>>>
>>> - not putting the task in TASK_TRACED but simply halt. This would work
>>>    without a change to ptrace_attach() but the task continues on any
>>>    signal. So a signal friendly task would continue and not notice a
>>>    thing.
>>
>> TASK_KILLABLE
>
> That would help but would require a change in ptrace_attach() or
> something in gdb/strace/â€¦

Well, I still think you should not touch ptrace_attach() at all.

> One thing I just noticed: If I don't register a handler for SIGUSR1 and
> send one to the application while it is in TASK_KILLABLE then the
> signal gets delivered.

Not really delivered... OK, it can be delivered (dequeued) before
the task sees SIGKILL, but this can be changed.

In short: in this case the task is correctly SIGKILL'ed. See sig_fatal()
in complete_signal().

> If I register a signal handler for it than it
> gets blocked and delivered once I resume the task.

Sure, if you have a handler, the signal is not fatal.

> Shouldn't it get blocked even if I don't register a handler for it?

No.

>> Am I understand correctly?
>>
>> If it was woken by PTRACE_ATTACH we set utask->skip_handler = 1 and
>> re-execute the instruction (yes, SIGTRAP, but this doesn't matter).
>> When the task hits this bp again we skip handler_chain() because it
>> was already reported.
>>
>> Yes? If yes, I don't think this can work. Suppose that the task
>> dequeues a signal before it returns to the usermode to re-execute
>> and enters the signal handler which can hit another uprobe.
>
> ach, those signals make everything complicated. I though signals are
> blocked until the single step is done

Yes, see uprobe_deny_signal().

> but my test just showed my
> something different.

I guess you missed the UTASK_SSTEP_TRAPPED logic.

But this doesn't matter. Surely we must not "block" signals _after_
the single step is done, and this is the problem.

> Okay, what now?

IMHO: don't do this ;)

> Blocking signals isn't probably a good idea.

This is bad and wrong idea, I think.

And, once again. Whatever you do, you can race with uprobe_register().
I mean, you must never expect that the task will hit the same uprobe
again, even if you are going to re-execute the same insn.

Oleg.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5 v2] uprobes: add global breakpoints
  2012-08-29 15:48           ` Oleg Nesterov
@ 2012-08-30 20:42             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-08-30 20:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, linux-kernel, x86, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Ananth N Mavinakaynahalli, stan_shebs,
	gdb-patches

On 08/29/2012 05:49 PM, Oleg Nesterov wrote:
>> That would help but would require a change in ptrace_attach() or
>> something in gdb/strace/â€¦
>
> Well, I still think you should not touch ptrace_attach() at all.

Okay.

>> One thing I just noticed: If I don't register a handler for SIGUSR1 and
>> send one to the application while it is in TASK_KILLABLE then the
>> signal gets delivered.
>
> Not really delivered... OK, it can be delivered (dequeued) before
> the task sees SIGKILL, but this can be changed.
>
> In short: in this case the task is correctly SIGKILL'ed. See sig_fatal()
> in complete_signal().
>
>> If I register a signal handler for it than it
>> gets blocked and delivered once I resume the task.
>
> Sure, if you have a handler, the signal is not fatal.
>
>> Shouldn't it get blocked even if I don't register a handler for it?
>
> No.

Now, that I read again it looks like a brain fart on my side.

>> ach, those signals make everything complicated. I though signals are
>> blocked until the single step is done
>
> Yes, see uprobe_deny_signal().
>
>> but my test just showed my
>> something different.
>
> I guess you missed the UTASK_SSTEP_TRAPPED logic.
>
> But this doesn't matter. Surely we must not "block" signals _after_
> the single step is done, and this is the problem.
>
>> Okay, what now?
>
> IMHO: don't do this ;)
>
>> Blocking signals isn't probably a good idea.
>
> This is bad and wrong idea, I think.
>
> And, once again. Whatever you do, you can race with uprobe_register().
> I mean, you must never expect that the task will hit the same uprobe
> again, even if you are going to re-execute the same insn.

After witting why I think you are wrong I understood what you meant :)
So let me try to get this rightâ€¦

>
> Oleg.

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-08-30 20:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1344355952-2382-1-git-send-email-bigeasy@linutronix.de>
2012-08-07 16:13 ` [RFC 5/5] uprobes: add global breakpoints Sebastian Andrzej Siewior
2012-08-08 13:19   ` Oleg Nesterov
2012-08-09 17:18     ` Sebastian Andrzej Siewior
2012-08-13 13:20       ` Oleg Nesterov
2012-08-14 11:44         ` Sebastian Andrzej Siewior
2012-08-09 18:25   ` Stan Shebs
2012-08-13 11:35   ` Peter Zijlstra
2012-08-20 15:27     ` Sebastian Andrzej Siewior
2012-08-21 19:42     ` [RFC 5/5 v2] " Sebastian Andrzej Siewior
2012-08-22 13:53       ` Oleg Nesterov
2012-08-27 18:57         ` Sebastian Andrzej Siewior
2012-08-29 15:48           ` Oleg Nesterov
2012-08-30 20:42             ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).