public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
       [not found]       ` <20181129114652.3696d6d7@gandalf.local.home>
@ 2018-11-30  2:27         ` Masami Hiramatsu
  2018-11-30  3:24           ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Masami Hiramatsu @ 2018-11-30  2:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, systemtap,
	Alexei Starovoitov, Daniel Borkmann

On Thu, 29 Nov 2018 11:46:52 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 29 Nov 2018 23:29:27 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > > One way to solve this is to also have a counter array that gets updated
> > > every time the index array gets updated. And save the counter to the
> > > shadow stack index as well. This way, we only call the return if the
> > > counter on the stack matches what's in the counter on the counter array
> > > for the index.  
> > 
> > Hmm, but we already know the current stack "header" entry when calling
> > handlers, don't we? I thought we just calcurate out from curr_ret_stack.
> 
> Basically we have this:
> 
>  array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...
> 
> On entry of function we do:
> 
	push header(including original ret_addr) onto ret_stack

> 	for (i = 0; i < array_entries; i++) {
> 		if (array[i]->entryfunc(...)) {
> 			push i onto ret_stack;
> 		}
> 	}
> 
> On the return side, we do:
> 
> 	idx = pop ret_stack;
> 
> 	array[idx]->retfunc(...);

So at this point we have the header on ret_stack, don't we? :)

Anyway, I think we may provide an API for unwinder to find correct
original return address form ret_stack. That is OK for me.


> 
> We only call the retfunc of a fgraph_ops if it returned non-zero from
> its entryfunc(). The return can happen a long time from now (which is
> why I don't save the &fgraph_ops on the ret_stack, because then we would
> never be able to free it).
> 
> In the mean time, lets say we unregistered (and freed) fgraph_ops_2 and
> then added fgraph_ops_3, so the array looks like:
> 
>  array: | &fgraph_ops_1 | &fgraph_ops_3 | &fgraph_ops_stub | ...
> 
> Then a function that was called when fgraph_ops_2 was on the stack
> returns, it will call array[1]->retfunc() which now belongs to
> fgraph_ops_3 and not fgraph_ops_2.
> 
> But if we add a counter array that gets updated when new ops are added
> to the array, we have this:
> 
>  cnt_array: |       4       |     2         |       0          |
>      array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...
> 
> And do:
> 
> 	for (i = 0; i < array_entries; i++) {
> 		if (array[i]->entryfunc(...)) {
> 			idx = cnt_array[i] << 8 | i;
> 			push idx onto ret_stack;
> 		}
> 	}
> 
> Then on return we have:
> 
> 	idx = pop ret_stack;
> 
> 	if (idx >> 8 == cnt_array[idx & 0xff])
> 		array[idx & 0xff]->retfunc(...);
> 
> It wouldn't call fgraph_ops_3 because we would change the cnt_array
> when we remove fgraph_ops_2 and the return would not match, as
> cnt_array[1] would then be "3".
> 
> > 
> > > > By the way, are there any way to hold a private data on each ret_stack entry?
> > > > Since kretprobe supports "entry data" passed from entry_handler to
> > > > return handler, we have to store the data or data-instance on the ret_stack.
> > > > 
> > > > This feature is used by systemtap to save the function entry data, like
> > > > function parameters etc., so that return handler analyzes the parameters
> > > > with return value.  
> > > 
> > > Yes, I remember you telling me about this at plumbers, and while I was
> > > writing this code I had that in mind. It wouldn't be too hard to
> > > implement, I just left it off for now. I also left it off because I
> > > have some questions about what exactly is needed. What size do you
> > > require to be stored. Especially if we want to keep the shadow stack
> > > smaller. I was going to find a way to implement some of the data that
> > > is already stored via the ret_stack with this instead, and make the
> > > ret_stack entry smaller. Should we allow just sizeof(long)*3? or just
> > > let user choose any size and if they run out of stack, then too bad. We
> > > just wont let it crash.  
> > 
> > I need only sizeof(unsigned long). If the kretprobe user requires more,
> > it will be fall back to current method -- get an "instance" and store
> > its address to the entry :-)
> 
> Awesome, then this shouldn't be too hard to implement.

Oops, anyway I noticed that I must store a value on each area so that we can
identify which kretprobe is using that if there are several kretprobes on same
function. So, kretprobe implementation will be something like below.

kretprobe_retfunc(trace, regs)
{
	kp = get_kprobe(trace->func);

	if (private == to_kretprobe(kp)) // this is directly mapped to current kprobe
		goto found_kretprobe;

	if (!list_empty(&kp->list)) {	// we need to find from multiple kretprobes
		list_for_each_entry(kp, &kp->list, list)
			if (private == kp)
				goto found_kretprobe;
	}

	// Or this must be an instance
	struct kretprobe_instance *ri = trace->private;
	rp = ri->rp;
	if (valid_kretprobe(rp))
		rp->handler(ri, regs);
	kretprobe_recycle_instance(ri);
	goto out;

found_kretprobe:
	struct kretprobe_instance rii = {.rp = to_kretprobe(kp),
		.ret_addr=trace->ret, .task = current}
	rp->handler(&rii, regs);

out:
	return 0;
}

I think we talked about pt_regs, which is redundant for return probe, so it should
be just a return value. (but how we pass it? trace->retval?)
That is OK for ftrace (but the transition needs more code).
And I would like to ask ebpf and systemtap people that is OK since it will change
the kernel ABI.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-30  2:27         ` [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Masami Hiramatsu
@ 2018-11-30  3:24           ` Steven Rostedt
  2018-11-30 14:12             ` Masami Hiramatsu
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2018-11-30  3:24 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, systemtap,
	Alexei Starovoitov, Daniel Borkmann

On Fri, 30 Nov 2018 11:26:58 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Thu, 29 Nov 2018 11:46:52 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Thu, 29 Nov 2018 23:29:27 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >   
> > > > One way to solve this is to also have a counter array that gets updated
> > > > every time the index array gets updated. And save the counter to the
> > > > shadow stack index as well. This way, we only call the return if the
> > > > counter on the stack matches what's in the counter on the counter array
> > > > for the index.    
> > > 
> > > Hmm, but we already know the current stack "header" entry when calling
> > > handlers, don't we? I thought we just calcurate out from curr_ret_stack.  
> > 
> > Basically we have this:
> > 
> >  array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...
> > 
> > On entry of function we do:
> >   
> 	push header(including original ret_addr) onto ret_stack

We can't put the ret_addr of the callback on the stack. What if that
ret_addr is a module, and it gets unloaded? We must not call it.

> 
> > 	for (i = 0; i < array_entries; i++) {
> > 		if (array[i]->entryfunc(...)) {
> > 			push i onto ret_stack;
> > 		}
> > 	}
> > 
> > On the return side, we do:
> > 
> > 	idx = pop ret_stack;
> > 
> > 	array[idx]->retfunc(...);  
> 
> So at this point we have the header on ret_stack, don't we? :)
> 
> Anyway, I think we may provide an API for unwinder to find correct
> original return address form ret_stack. That is OK for me.

Yes. In fact, I have something that worked for that. I'll have to test
it some more.


> > > I need only sizeof(unsigned long). If the kretprobe user requires more,
> > > it will be fall back to current method -- get an "instance" and store
> > > its address to the entry :-)  
> > 
> > Awesome, then this shouldn't be too hard to implement.  
> 
> Oops, anyway I noticed that I must store a value on each area so that we can
> identify which kretprobe is using that if there are several kretprobes on same
> function. So, kretprobe implementation will be something like below.
> 
> kretprobe_retfunc(trace, regs)
> {
> 	kp = get_kprobe(trace->func);
> 
> 	if (private == to_kretprobe(kp)) // this is directly mapped to current kprobe
> 		goto found_kretprobe;
> 
> 	if (!list_empty(&kp->list)) {	// we need to find from multiple kretprobes
> 		list_for_each_entry(kp, &kp->list, list)
> 			if (private == kp)
> 				goto found_kretprobe;
> 	}
> 
> 	// Or this must be an instance
> 	struct kretprobe_instance *ri = trace->private;
> 	rp = ri->rp;
> 	if (valid_kretprobe(rp))
> 		rp->handler(ri, regs);
> 	kretprobe_recycle_instance(ri);
> 	goto out;
> 
> found_kretprobe:
> 	struct kretprobe_instance rii = {.rp = to_kretprobe(kp),
> 		.ret_addr=trace->ret, .task = current}
> 	rp->handler(&rii, regs);
> 
> out:
> 	return 0;
> }
> 
> I think we talked about pt_regs, which is redundant for return probe, so it should
> be just a return value. (but how we pass it? trace->retval?)

Yeah, we can add that.

> That is OK for ftrace (but the transition needs more code).
> And I would like to ask ebpf and systemtap people that is OK since it will change
> the kernel ABI.

I agree.

-- Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users
  2018-11-30  3:24           ` Steven Rostedt
@ 2018-11-30 14:12             ` Masami Hiramatsu
  0 siblings, 0 replies; 3+ messages in thread
From: Masami Hiramatsu @ 2018-11-30 14:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, systemtap,
	Alexei Starovoitov, Daniel Borkmann

On Thu, 29 Nov 2018 22:24:35 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 30 Nov 2018 11:26:58 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > On Thu, 29 Nov 2018 11:46:52 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Thu, 29 Nov 2018 23:29:27 +0900
> > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >   
> > > > > One way to solve this is to also have a counter array that gets updated
> > > > > every time the index array gets updated. And save the counter to the
> > > > > shadow stack index as well. This way, we only call the return if the
> > > > > counter on the stack matches what's in the counter on the counter array
> > > > > for the index.    
> > > > 
> > > > Hmm, but we already know the current stack "header" entry when calling
> > > > handlers, don't we? I thought we just calcurate out from curr_ret_stack.  
> > > 
> > > Basically we have this:
> > > 
> > >  array: | &fgraph_ops_1 | &fgraph_ops_2 | &fgraph_ops_stub | ...
> > > 
> > > On entry of function we do:
> > >   
> > 	push header(including original ret_addr) onto ret_stack
> 
> We can't put the ret_addr of the callback on the stack. What if that
> ret_addr is a module, and it gets unloaded? We must not call it.

But in that case, how can we recover the original addr on the kernel (real)
stack? I don't call the entry, but kretprobe handler will need the info
to record as a caller-address.

> > 
> > > 	for (i = 0; i < array_entries; i++) {
> > > 		if (array[i]->entryfunc(...)) {
> > > 			push i onto ret_stack;
> > > 		}
> > > 	}
> > > 
> > > On the return side, we do:
> > > 
> > > 	idx = pop ret_stack;
> > > 
> > > 	array[idx]->retfunc(...);  
> > 
> > So at this point we have the header on ret_stack, don't we? :)
> > 
> > Anyway, I think we may provide an API for unwinder to find correct
> > original return address form ret_stack. That is OK for me.
> 
> Yes. In fact, I have something that worked for that. I'll have to test
> it some more.

Great! I think it will be enough for kretprobe.

> > > > I need only sizeof(unsigned long). If the kretprobe user requires more,
> > > > it will be fall back to current method -- get an "instance" and store
> > > > its address to the entry :-)  
> > > 
> > > Awesome, then this shouldn't be too hard to implement.  
> > 
> > Oops, anyway I noticed that I must store a value on each area so that we can
> > identify which kretprobe is using that if there are several kretprobes on same
> > function. So, kretprobe implementation will be something like below.
> > 
> > kretprobe_retfunc(trace, regs)
> > {
> > 	kp = get_kprobe(trace->func);
> > 
> > 	if (private == to_kretprobe(kp)) // this is directly mapped to current kprobe
> > 		goto found_kretprobe;
> > 
> > 	if (!list_empty(&kp->list)) {	// we need to find from multiple kretprobes
> > 		list_for_each_entry(kp, &kp->list, list)
> > 			if (private == kp)
> > 				goto found_kretprobe;
> > 	}
> > 
> > 	// Or this must be an instance
> > 	struct kretprobe_instance *ri = trace->private;
> > 	rp = ri->rp;
> > 	if (valid_kretprobe(rp))
> > 		rp->handler(ri, regs);
> > 	kretprobe_recycle_instance(ri);
> > 	goto out;
> > 
> > found_kretprobe:
> > 	struct kretprobe_instance rii = {.rp = to_kretprobe(kp),
> > 		.ret_addr=trace->ret, .task = current}
> > 	rp->handler(&rii, regs);
> > 
> > out:
> > 	return 0;
> > }
> > 
> > I think we talked about pt_regs, which is redundant for return probe, so it should
> > be just a return value. (but how we pass it? trace->retval?)
> 
> Yeah, we can add that.

OK, then I will start with making a fake pt_regs on stack and call handler,
which will be something like,

	struct pt_regs regs = {};
	regs_set_return_value(&regs, trace->retval);
	rp->handler(ri, &regs);

Thank you,

> > That is OK for ftrace (but the transition needs more code).
> > And I would like to ask ebpf and systemtap people that is OK since it will change
> > the kernel ABI.
> 
> I agree.
> 
> -- Steve


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-30 14:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20181122012708.491151844@goodmis.org>
     [not found] ` <20181126182112.422b914dd00ecb36e15f7b07@kernel.org>
     [not found]   ` <20181126113215.4259d473@gandalf.local.home>
     [not found]     ` <20181129232927.74ca5f294e97fc58b15bf8c6@kernel.org>
     [not found]       ` <20181129114652.3696d6d7@gandalf.local.home>
2018-11-30  2:27         ` [RFC][PATCH 00/14] function_graph: Rewrite to allow multiple users Masami Hiramatsu
2018-11-30  3:24           ` Steven Rostedt
2018-11-30 14:12             ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).