RE: Review patches of user space kprobe

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* RE: Review patches of user space kprobe
@ 2006-01-06  4:27 Zhang, Yanmin
  2006-01-06 12:28 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-06  4:27 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Zhang, Yanmin
>>Sent: 2006年1月6日 10:52
>>To: prasanna@in.ibm.com
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: RE: Review patches of user space kprobe
>>
>>>>-----Original Message-----
>>>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>>>Sent: 2006年1月5日 19:14
>>>>To: Zhang, Yanmin
>>>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>>>Subject: Re: Review patches of user space kprobe
>>>>
>>>>> >>+ */
>>>>> >>+static struct kprobe *get_uprobe_at(struct inode *inode, unsigned
>>>>> long offset)
>>>>> >>+{
>>>>> >>+	struct hlist_head *head;
>>>>> >>+	struct hlist_node *node;
>>>>> >>+	struct kprobe *p;
>>>>> >>+
>>>>> >>+	head = &kprobe_table[hash_long((unsigned long)inode * offset,
>>>>> >>+				       KPROBE_HASH_BITS)];
>>>>> >>+	hlist_for_each_entry(p, node, head, hlist) {
>>>>> >>+		if (p->pre_handler == aggr_pre_handler)
>>>>> >>+			return p;
>>>>> >>+		else {
>>>>> >>+			struct uprobe *user = container_of(p,
>>>>> >>+							struct uprobe,
>>>>> kp);
>>>>> Kprobe and uprobe share the same hash table. Does p here always point to
>>>>> uprobe?
>>>>
>>>>Check can be made before accessig uprobe.
>>>>if (!kernel_text_address((unsigned long)p->addr))
>>Incorrect. get_uprobe, the caller of get_uprobe_at, might be crazy. current_uprobe might be set as up and get_user_page(up) is called
>>incorrectly. The logic is *not clear*.
I might misunderstand your reply. If put the check before using container_of in function get_uprobe_at, the issue could be resolved.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-06  4:27 Review patches of user space kprobe Zhang, Yanmin
@ 2006-01-06 12:28 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-06 12:28 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>>>> >>+	head = &kprobe_table[hash_long((unsigned long)inode * offset,
> >>>>> >>+				       KPROBE_HASH_BITS)];
> >>>>> >>+	hlist_for_each_entry(p, node, head, hlist) {
> >>>>> >>+		if (p->pre_handler == aggr_pre_handler)
> >>>>> >>+			return p;
> >>>>> >>+		else {
> >>>>> >>+			struct uprobe *user = container_of(p,
> >>>>> >>+							struct uprobe,
> >>>>> kp);
> >>>>> Kprobe and uprobe share the same hash table. Does p here always point to
> >>>>> uprobe?
> >>>>
> >>>>Check can be made before accessig uprobe.
> >>>>if (!kernel_text_address((unsigned long)p->addr))
> >>Incorrect. get_uprobe, the caller of get_uprobe_at, might be crazy. current_uprobe might be set as up and get_user_page(up) is called
> >>incorrectly. The logic is *not clear*.
> I might misunderstand your reply. If put the check before using container_of in function get_uprobe_at, the issue could be resolved.

even I meant the same thing of adding a check before we access uprobe structure using container_of.


Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-09  2:06 Zhang, Yanmin
  0 siblings, 0 replies; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-09  2:06 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年1月6日 18:26
>>To: Zhang, Yanmin
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>> >>This is done to make use of register_kprobe(), the address returned by
>>> >>kmap_atomic is passed to register_kprobe() and even though the kernel data
>>> >>address is stored at kp.addr, that is enough to distinguish between
>>> >>kernel probes.
>>> Is it true on all arch? I don't think it's safe to do so. Actually, register_kprobe just needs know it's a uprobe.
>>yes, otherwise need to find out a way to pass the mapped address to register_kprobe(). All this can be avoided
>>if register_kprobe() is bypassed :).
It's a good solution if register_kprobe is bypassed.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-09  2:04 Zhang, Yanmin
  0 siblings, 0 replies; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-09  2:04 UTC (permalink / raw)
  To: Roland McGrath, prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Roland McGrath
>>Sent: 2006年1月6日 18:31
>>To: prasanna@in.ibm.com
>>Cc: Zhang, Yanmin; systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> I think the dll's share the same vma's.
>>
>>You cannot rely on mapping addresses being the same, they are not always
>>so.  There are always separate vm_area_struct's in separate processes even
>>when they appear to be identical.
I agree with Roland. Although sometimes they appear to be identical, the vma themselves are different. And a vma might be released/reallocated before readpage/readpages are called. It's easy to introduce race condition when storing temp member in uprobe.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-09  1:48 Zhang, Yanmin
  0 siblings, 0 replies; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-09  1:48 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年1月6日 17:32
>>To: Zhang, Yanmin
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> >>> are only executed after the loop of hlist_for_each_entry. Is it correct?
>>> >>>
>>> >>
>>> >>that's correct. The page is mapped only once for the first match in the loop,
>>> >>then all the probes are inserted into that page in the hlist_for_each loop and
>>> >>then the page is unmapped only once after the end of the loop.
>>> So call lock_page for many times, and call unlock_page for one time?
>>>
>>no, lock_page and unlock_page is called only once, there is a small correction,
>>insert_probe_page() should be moved out of if condition so that all probes
>>are inserted on that page.
I'm still confused. Anyway, I will wait for your new codes. In addition, I still think the second parameter of map_uprobe_page is not good and could be deleted. The logic related to the second parameter is very simple and could be moved to the caller of map_uprobe_page. 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-06 10:22 ` Prasanna S Panchamukhi
@ 2006-01-06 10:30   ` Roland McGrath
  0 siblings, 0 replies; 32+ messages in thread
From: Roland McGrath @ 2006-01-06 10:30 UTC (permalink / raw)
  To: prasanna; +Cc: Zhang, Yanmin, systemtap, Keshavamurthy, Anil S, Mao, Bibo

> I think the dll's share the same vma's.

You cannot rely on mapping addresses being the same, they are not always
so.  There are always separate vm_area_struct's in separate processes even
when they appear to be identical.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-06  9:08 Zhang, Yanmin
@ 2006-01-06 10:22 ` Prasanna S Panchamukhi
  2006-01-06 10:30   ` Roland McGrath
  0 siblings, 1 reply; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-06 10:22 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>> >>If vma's are temporarily defined, then readpage hooks should walk
> >>> >>through the vma list to get the vma. To reduce the time spent in
> >>> >>readpage/readpages hooks in fastpath, we store it while registeration.
> >>> Incorrect. This vma only represents one process who maps the page. How about other processes who map the same page?
> >>
> >>Do you mean the same page is mapped in different vma?
> Map the page in different vma of different processes (address spaces). I think it's common. For example, libc.so is mapped to many processes. Every process has its own vma to map the page.
> 

I think the dll's share the same vma's.
below is the output of different process libc mmaps.
./1/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./1/maps:0082f000-00832000 rw-p 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./1996/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./1996/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2000/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2000/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2011/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2011/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./.2014/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./.2014/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2031/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2031/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2032/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2032/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2033/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2033/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2034/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2034/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2035/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2035/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2397/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so

> >>
> >>This is done to make use of register_kprobe(), the address returned by
> >>kmap_atomic is passed to register_kprobe() and even though the kernel data
> >>address is stored at kp.addr, that is enough to distinguish between
> >>kernel probes.
> Is it true on all arch? I don't think it's safe to do so. Actually, register_kprobe just needs know it's a uprobe.
yes, otherwise need to find out a way to pass the mapped address to register_kprobe(). All this can be avoided
if register_kprobe() is bypassed :).

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-06  9:12 Zhang, Yanmin
@ 2006-01-06  9:28 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-06  9:28 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>> are only executed after the loop of hlist_for_each_entry. Is it correct?
> >>>
> >>
> >>that's correct. The page is mapped only once for the first match in the loop,
> >>then all the probes are inserted into that page in the hlist_for_each loop and
> >>then the page is unmapped only once after the end of the loop.
> So call lock_page for many times, and call unlock_page for one time?
> 
no, lock_page and unlock_page is called only once, there is a small correction, 
insert_probe_page() should be moved out of if condition so that all probes
are inserted on that page.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-06  9:12 Zhang, Yanmin
  2006-01-06  9:28 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-06  9:12 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年1月6日 17:12
>>To: Zhang, Yanmin
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> >>readpage() routine reads one page at a time. we map the page one time and walk the
>>> >>probes list for this inode, insert all the probes within this page and then unmap it.
>>> Yes, I did see codes inserting all the probes on the page, but below codes:
>>> +		if (kprobe_page_mapped) {
>>> +			unmap_uprobe_page(up);
>>> +			unlock_page(up->page);
>>> +		}
>>>
>>> are only executed after the loop of hlist_for_each_entry. Is it correct?
>>>
>>
>>that's correct. The page is mapped only once for the first match in the loop,
>>then all the probes are inserted into that page in the hlist_for_each loop and
>>then the page is unmapped only once after the end of the loop.
So call lock_page for many times, and call unlock_page for one time?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-06  5:29 Zhang, Yanmin
@ 2006-01-06  9:08 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-06  9:08 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>readpage() routine reads one page at a time. we map the page one time and walk the
> >>probes list for this inode, insert all the probes within this page and then unmap it.
> Yes, I did see codes inserting all the probes on the page, but below codes:
> +		if (kprobe_page_mapped) {
> +			unmap_uprobe_page(up);
> +			unlock_page(up->page);
> +		}
> 
> are only executed after the loop of hlist_for_each_entry. Is it correct?
> 

that's correct. The page is mapped only once for the first match in the loop, 
then all the probes are inserted into that page in the hlist_for_each loop and
then the page is unmapped only once after the end of the loop.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-06  9:08 Zhang, Yanmin
  2006-01-06 10:22 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-06  9:08 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: Prasanna S Panchamukhi [mailto:prasanna@in.ibm.com]
>>Sent: 2006年1月6日 16:57
>>To: Zhang, Yanmin
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> >>>uprobes in uprobe_module. Uprobes just need keep its own specific info.
>>> >>
>>> >>inode and offset combination uniquely identifies individual user space
>>> >>probe,
>>> What does "individual user space probe" mean here?
>>>
>>
>>Here individual user space probe mean, the place where the breakpoint
>>instruction is inserted.
>>
>>>Any uprobe will be shared by all processes who map the same page. Right?
>>
>>that's right.
>>
>>>
>>>  hence when a probe is hit, combination inode and offset is used to
>>> >>identify each probe. Keeping inode pointer in struct uprobe clearly
>>> >>reflects this concept.
>>> What's the relationship between uprobe and uprobe_module? All uprobes linked by uprobe_module->ulist_head have the same inode, right?
>>If it's correct, why not to just include inode in uprobe_module?
>>
>>This can be done, but in such a case for every probe hit, the uprobe_module list much be traversed just to get the inode.
>>
>>> >>>>		struct vm_area_struct *vma;
>>> >>>Item page and vma are used temporarily. They are not used when the probe
>>> >>>is hit. To shrink struct uprobe, they could be deleted. A new super
>>> >>>struct could be defined to include struct uprobe, or always to search
>>> >>>them when they are needed. Application might be inserted huge uprobes.
>>> >>>It's better to keep the struct smaller.
>>> >>
>>> >>If vma's are temporarily defined, then readpage hooks should walk
>>> >>through the vma list to get the vma. To reduce the time spent in
>>> >>readpage/readpages hooks in fastpath, we store it while registeration.
>>> Incorrect. This vma only represents one process who maps the page. How about other processes who map the same page?
>>
>>Do you mean the same page is mapped in different vma?
Map the page in different vma of different processes (address spaces). I think it's common. For example, libc.so is mapped to many processes. Every process has its own vma to map the page.

>>As of now we are not supporting such cases.
>>
>>> >>understand what you mean by "assign a fixed address in user space"
>>> map_uprobe_page sets uprobe->kp.addr to address returned by kmap_atomic, then insert_probe_page will add offset to uprobe->kp.addr.
>>Later, register_kprobe uses uprobe->kp.addr to estimate if the address is in kernel text. Is it wrong? As for why you didn't hit it,
>>I think mostly now uprobe->kp.addr is pointing to the kernel data area, but it doesn't point to user space. Right?
>>
>>This is done to make use of register_kprobe(), the address returned by
>>kmap_atomic is passed to register_kprobe() and even though the kernel data
>>address is stored at kp.addr, that is enough to distinguish between
>>kernel probes.
Is it true on all arch? I don't think it's safe to do so. Actually, register_kprobe just needs know it's a uprobe.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-06  5:22 Zhang, Yanmin
@ 2006-01-06  9:04 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-06  9:04 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>User space probes support insertion of probes on dynamically linked libraries
> >>and even probes can be inserted on the text pages that are not even loaded
> >>into the memory.
> It doesn't resolve case b). 
> 

Need to look how to support such cases.

> 
> >>
> >>>3) Can function register_userspace_probe do not call register_kprobe? I =
> >>>think it's not necessary. It's just my feeling. It's up to you to make =
> >>>decision. :)
> >>
> >>register_kprobe already does most of what userspace probe registeration needs.
> Function register_kprobe is not big. Current register_userspace_probe calls register_kprobe, then register_kprobe calls back to uprobe-specific functions. It looks confusing. Why not to just bypass register_kprobe?
> 

Let me check how the code looks if we bypass register_kprobe().

> >>
> >>could you please elaborate this.
> Under smp environment, if a page (inode,offset) is mapped to the address spaces of many processes, then when a uprobe is registered on the page, uprobe->vma is just point to one of them. Then, insert_kprobe_user=>arch_arm_uprobe=>flush_icache_user_range, only this vma (or address space) is flushed.
> 
At present only the vma containing the page is flused, if the pages are present
on other vmas, even the probes should be inserted on such pages and those vma
need to be flused. 

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-06  3:20 Zhang, Yanmin
@ 2006-01-06  8:53 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-06  8:53 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>>uprobes in uprobe_module. Uprobes just need keep its own specific info.
> >>
> >>inode and offset combination uniquely identifies individual user space
> >>probe,
> What does "individual user space probe" mean here? 
> 

Here individual user space probe mean, the place where the breakpoint 
instruction is inserted. 

>Any uprobe will be shared by all processes who map the same page. Right?

that's right.

> 
>  hence when a probe is hit, combination inode and offset is used to
> >>identify each probe. Keeping inode pointer in struct uprobe clearly
> >>reflects this concept.
> What's the relationship between uprobe and uprobe_module? All uprobes linked by uprobe_module->ulist_head have the same inode, right? If it's correct, why not to just include inode in uprobe_module?

This can be done, but in such a case for every probe hit, the uprobe_module list much be traversed just to get the inode.

> >>>>		struct vm_area_struct *vma;
> >>>Item page and vma are used temporarily. They are not used when the probe
> >>>is hit. To shrink struct uprobe, they could be deleted. A new super
> >>>struct could be defined to include struct uprobe, or always to search
> >>>them when they are needed. Application might be inserted huge uprobes.
> >>>It's better to keep the struct smaller.
> >>
> >>If vma's are temporarily defined, then readpage hooks should walk
> >>through the vma list to get the vma. To reduce the time spent in
> >>readpage/readpages hooks in fastpath, we store it while registeration.
> Incorrect. This vma only represents one process who maps the page. How about other processes who map the same page?

Do you mean the same page is mapped in different vma?
As of now we are not supporting such cases.

> >>understand what you mean by "assign a fixed address in user space"
> map_uprobe_page sets uprobe->kp.addr to address returned by kmap_atomic, then insert_probe_page will add offset to uprobe->kp.addr. Later, register_kprobe uses uprobe->kp.addr to estimate if the address is in kernel text. Is it wrong? As for why you didn't hit it, I think mostly now uprobe->kp.addr is pointing to the kernel data area, but it doesn't point to user space. Right? 

This is done to make use of register_kprobe(), the address returned by 
kmap_atomic is passed to register_kprobe() and even though the kernel data 
address is stored at kp.addr, that is enough to distinguish between 
kernel probes.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-06  2:52 Zhang, Yanmin
@ 2006-01-06  6:53 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-06  6:53 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>> >>+							struct uprobe,
> >>> kp);
> >>> Kprobe and uprobe share the same hash table. Does p here always point to
> >>> uprobe?
> >>
> >>Check can be made before accessig uprobe.
> >>if (!kernel_text_address((unsigned long)p->addr))
> Incorrect. get_uprobe, the caller of get_uprobe_at, might be crazy. current_uprobe might be set as up and get_user_page(up) is called incorrectly. The logic is *not clear*.

This was all done to handle aggrigate uprobes. get_uprobe() is called after 
checking if the address is not kernel_text_address(). get_uprobe_at() returns 
kprobe structures from the kprobe hash list. If the kprobe structure returned 
is a aggrigate kprobe structure, the aggrigate hash list is walked to get the 
individual kprobe structure that is included in the uprobe structure. The page 
containing the probe address is located and locked in the memory, so that the
breakpoint instruction can be replaced with original instruction, if the single
stepping out-of-line could not be achieved. But as of now we are not handling
the case when single stepping out-of-line fails, which need to be handled.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-06  5:29 Zhang, Yanmin
  2006-01-06  9:08 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-06  5:29 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: Prasanna S Panchamukhi [mailto:prasanna@in.ibm.com]
>>Sent: 2006年1月5日 19:10
>>To: Zhang, Yanmin
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>>
>>>
>>> >>+					kprobe_page_mapped = 1;
>>> >>+					retval = insert_probe_page(up);
>>> >>+				}
>>> >>+			}
>>> >>+		}
>>> >>+		if (kprobe_page_mapped) {
>>> The logic here is incorrect. If there are many uprobes at the same page,
>>> up just points to the last one. How about others?
>>>
>>
>>readpage() routine reads one page at a time. we map the page one time and walk the
>>probes list for this inode, insert all the probes within this page and then unmap it.
Yes, I did see codes inserting all the probes on the page, but below codes:
+		if (kprobe_page_mapped) {
+			unmap_uprobe_page(up);
+			unlock_page(up->page);
+		}

are only executed after the loop of hlist_for_each_entry. Is it correct?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-06  5:22 Zhang, Yanmin
  2006-01-06  9:04 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-06  5:22 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: Prasanna S Panchamukhi [mailto:prasanna@in.ibm.com]
>>Sent: 2006年1月5日 18:33
>>To: Zhang, Yanmin
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>
>>>General question:
>>>1) How to insert an uprobe at anonymous page (VMA)? I think there are 2 =
>>>cases related to the question.
>>>	a) Many applications execute codes produced themselves, such like JIT =
>>>(Just-In-Time) of JVM.
>>
>>At present we do not support it, need to look into such a case.
>>
>>>	b) Some executables include TEXTREL section. When they are loaded into =
>>>memory and linked dynamically, the text section might be changed, and =
>>>kernel will do a Copy-On-Write to create a new anonymous page and map =
>>>the new page to the process address space. So after the process starts, =
>>>we couldn't insert uprobe on its copied pages.
>>>Should a new interface be added to support it? The parameters could be =
>>>process id and offset in the process address space. Of course, it could =
>>>be an enhancement and implemented later.
>>
>>User space probes support insertion of probes on dynamically linked libraries
>>and even probes can be inserted on the text pages that are not even loaded
>>into the memory.
It doesn't resolve case b). 


>>
>>>3) Can function register_userspace_probe do not call register_kprobe? I =
>>>think it's not necessary. It's just my feeling. It's up to you to make =
>>>decision. :)
>>
>>register_kprobe already does most of what userspace probe registeration needs.
Function register_kprobe is not big. Current register_userspace_probe calls register_kprobe, then register_kprobe calls back to uprobe-specific functions. It looks confusing. Why not to just bypass register_kprobe?


>>
>>>2) Function get_inode_ops should take care of errors and its caller, =
>>>register_userspace_probe should check if the return value of =
>>>get_inode_ops is IS_ERR. If so, the error code should be returned =
>>>instead of a hard-coded -ENOSYS.
>>
>>Next patch release will take care of these things.
>>
>>>>>>+			spin_unlock(&mapping->i_mmap_lock);
>>>>>>+			return vma;
>>>It's not safe to return vma without lock. There is a race condition. If =
>>>vma is released by another thread, kernel might be crazy when this =
>>>thread tries to access it later.
>>
>>>If the page is mapped to many vma in different processes, function =
>>>find_get_vma just returns one vma. It's not enough.
>>>I'd like to suggest to do the flush_icache in the vma_prio_tree_foreach =
>>>loop.
>>
>>could you please elaborate this.
Under smp environment, if a page (inode,offset) is mapped to the address spaces of many processes, then when a uprobe is registered on the page, uprobe->vma is just point to one of them. Then, insert_kprobe_user=>arch_arm_uprobe=>flush_icache_user_range, only this vma (or address space) is flushed.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-06  3:20 Zhang, Yanmin
  2006-01-06  8:53 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-06  3:20 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: Prasanna S Panchamukhi [mailto:prasanna@in.ibm.com]
>>Sent: 2006年1月5日 18:31
>>To: Zhang, Yanmin
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>Yanmin,
>>
>>Thanks for your comments, Please see my comments inline below.
>>
>>>>>		/*inode of the application */
>>>>>		struct inode *inode;
>>>How about using a pointer to uprobe_module to locate the inode? Although
>>>the storage space is same, it's better to keep all the same info of
>>>uprobes in uprobe_module. Uprobes just need keep its own specific info.
>>
>>inode and offset combination uniquely identifies individual user space
>>probe,
What does "individual user space probe" mean here? Any uprobe will be shared by all processes who map the same page. Right?


 hence when a probe is hit, combination inode and offset is used to
>>identify each probe. Keeping inode pointer in struct uprobe clearly
>>reflects this concept.
What's the relationship between uprobe and uprobe_module? All uprobes linked by uprobe_module->ulist_head have the same inode, right? If it's correct, why not to just include inode in uprobe_module?


>>
>>
>>>>		/*offset within the page */
>>>This comment is incorrect. Functions, such like
>>>register_userspace_probe, use it as the offset from the beginning of the
>>>file, not a page of the file. Right? Although your example uses it as an
>>>offset of page. Frank's reply said this item could be deleted. I don't
>>>think he is correct because uprobe->kp.addr is used when calling
>>>register_kprobe.
>>
>>Next patch release will take care of this, offset is hidden from the user and user need
>>not initialize it.
>>
>>
>>>>		unsigned long offset;=20
>>>>		/*page containing probes */
>>>>		struct page *page;
>>>>		/* virtual memory area containing the probes */
>>>>		struct vm_area_struct *vma;
>>>Item page and vma are used temporarily. They are not used when the probe
>>>is hit. To shrink struct uprobe, they could be deleted. A new super
>>>struct could be defined to include struct uprobe, or always to search
>>>them when they are needed. Application might be inserted huge uprobes.
>>>It's better to keep the struct smaller.
>>
>>If vma's are temporarily defined, then readpage hooks should walk
>>through the vma list to get the vma. To reduce the time spent in
>>readpage/readpages hooks in fastpath, we store it while registeration.
Incorrect. This vma only represents one process who maps the page. How about other processes who map the same page?


>>
>>
>>
>>>>	};
>>>>
>>>>	struct uprobe_module {
>>>>		struct hlist_head ulist_head;
>>>>		/* hlist head containing list of probes within this
>>module */=20
>>>>		struct list_head mlist;
>>>>		/* list of uprobes_modules */
>>>>		struct inode *inode;
>>>Item inode could be deleted. Use uprobe_module->nd.dentry->d_inode.
>>
>>This will be taken care in the next patch release.
>>
>>
>>>>		/* inode of the application on which probes are
>>implanted */
>>>>		struct nameidata nd;
>>>>		/* to hold path/dentry etc. */
>>>Struct nameidata looks a little big. A pointer to dentry might be
>>>enough.
>>
>>nameidata is allocated per module/application and not for every probe.
My above questions answer this one.


>>At present the reference count to the mount point is held until
>>the probes is unregistered. I will relook at this issue.
>>
>>>>+	if (!kernel_text_address((unsigned long)p->addr))
>>>>+		addr =3D insert_kprobe_user(p);
>>>After calling insert_kprobe_user, p->addr is still equal to the return
>>>value of kmap_atomic. So when the uprobe is hit, how to find it from the
>>>hash table?
>>
>>Here it does not return just the mapped address, but inode * offset as the address.
>>Each probe is uniquely identified by the combination of inode and offset, kprobe_handler
>>does the same thing.
>>
>>>And I didn't see kprobe_handler is changed to call
>>>get_kprobe_user if uprobe is hit.
>>
>>kprobe_handler() calls get_uprobe_at() when the probe is hit.
>>
>>>>+ */
>>>>+static int map_uprobe_page(struct uprobe *up, int mapped)
>>>The second parameter seems useless.
>>
>>Its being used in the readpage patch.
>>
>>
>>
>>>>+{
>>>>+{
>>>>+	up->kp.addr =3D (kprobe_opcode_t *) ((unsigned long)up->kp.addr +
>>>>+				 (unsigned long)(up->offset &
>>~PAGE_MASK));
>>Does it guarantee later register_kprobe could distinguish it's a user
>>probe? Perhaps it's better to assign a fixed address in user space, such
>>like 0.
>>
>>Yes,register_kprobe() distinguishes user space probes. I could not
>>understand what you mean by "assign a fixed address in user space"
map_uprobe_page sets uprobe->kp.addr to address returned by kmap_atomic, then insert_probe_page will add offset to uprobe->kp.addr. Later, register_kprobe uses uprobe->kp.addr to estimate if the address is in kernel text. Is it wrong? As for why you didn't hit it, I think mostly now uprobe->kp.addr is pointing to the kernel data area, but it doesn't point to user space. Right? 
A fix address means a value out of kernel space. Such like a value under PAGE_OFFSET on i386.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-06  2:52 Zhang, Yanmin
  2006-01-06  6:53 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-06  2:52 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年1月5日 19:14
>>To: Zhang, Yanmin
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> >>+ */
>>> >>+static struct kprobe *get_uprobe_at(struct inode *inode, unsigned
>>> long offset)
>>> >>+{
>>> >>+	struct hlist_head *head;
>>> >>+	struct hlist_node *node;
>>> >>+	struct kprobe *p;
>>> >>+
>>> >>+	head = &kprobe_table[hash_long((unsigned long)inode * offset,
>>> >>+				       KPROBE_HASH_BITS)];
>>> >>+	hlist_for_each_entry(p, node, head, hlist) {
>>> >>+		if (p->pre_handler == aggr_pre_handler)
>>> >>+			return p;
>>> >>+		else {
>>> >>+			struct uprobe *user = container_of(p,
>>> >>+							struct uprobe,
>>> kp);
>>> Kprobe and uprobe share the same hash table. Does p here always point to
>>> uprobe?
>>
>>Check can be made before accessig uprobe.
>>if (!kernel_text_address((unsigned long)p->addr))
Incorrect. get_uprobe, the caller of get_uprobe_at, might be crazy. current_uprobe might be set as up and get_user_page(up) is called incorrectly. The logic is *not clear*.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2006-01-05  7:09 Zhang, Yanmin
@ 2006-01-05 11:27 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-05 11:27 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> The 4th patch is to avoid probes misses in smp environment. I just have
> some initial comments because the patch is not complete.
> 

> >>+
> >>+	if (vma->vm_flags & VM_GROWSDOWN) {
> >>+		page_addr = PAGE_ALIGN(stack_addr);
> It's wrong to use PAGE_ALIGN here. I think mostly page_addr is bigger
> than stack_addr. Pls. use stack_addr&PAGE_MASK.

This will be fixed in the next release.
> 
> 
> >>+		if (((stack_addr - sizeof(long long)) - page_addr) >
> size) {
> Pay attention to the type of stack_add and page_addr. They are unsigned
> long. So it's better to change it to:
> if ((stack_addr - sizeof(long long)) > (size + page_addr)) {
> 
> And why consider a sizeof(long long) here? 
> 

For the safety reasons, some space is left below the stack pointer, so that the probed 
instruction may use this space while single stepping.

> >> {
> >>@@ -140,10 +178,13 @@ static inline void prepare_singlestep(st
> >> 	if (p->opcode == BREAKPOINT_INSTRUCTION)
> >> 		regs->eip = (unsigned long)p->addr;
> >> 	else {
> >>-		if (!kernel_text_address((unsigned long)p->addr)) {
> >>-			arch_disarm_uprobe(current_uprobe);
> >>-			regs->eip = (unsigned long)uprobe_addr;
> >>-		} else
> >>+		if (!kernel_text_address((unsigned long)p->addr))
> >>+			uprobe_single_step(current_uprobe, regs);
> 1) If uprobe_single_step returns -EFAULT, how does the thread go ahead?
> Note the first byte of the original instruction is break now, so the
> instruction from the second byte is wrong.

As of now it is not be handled but, it can be handled by allowing the page fault to
succeed or expanding the stack space or  replacing the original instruction if nothing
 can be done.

> 2) If prepare_singlestep succeeds, but later the real instruction of
> single step might cause a fatal error, for example, to access illegal
> address, then the process will be killed. Would the current
> kprobe(uprobe) environment be restored clearly? Such like kprobe_cpu,
> kprobe_status and so on.
> 

If its an illegal instruction, kprobe_fault_handler() is given control,
so that it can take care of restoring it cleanly.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2005-12-22 13:24 Zhang, Yanmin
@ 2006-01-05 11:10 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-05 11:10 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>+ */
> >>+static struct kprobe *get_uprobe_at(struct inode *inode, unsigned
> long offset)
> >>+{
> >>+	struct hlist_head *head;
> >>+	struct hlist_node *node;
> >>+	struct kprobe *p;
> >>+
> >>+	head = &kprobe_table[hash_long((unsigned long)inode * offset,
> >>+				       KPROBE_HASH_BITS)];
> >>+	hlist_for_each_entry(p, node, head, hlist) {
> >>+		if (p->pre_handler == aggr_pre_handler)
> >>+			return p;
> >>+		else {
> >>+			struct uprobe *user = container_of(p,
> >>+							struct uprobe,
> kp);
> Kprobe and uprobe share the same hash table. Does p here always point to
> uprobe?

Check can be made before accessig uprobe.
if (!kernel_text_address((unsigned long)p->addr))

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2005-12-22  5:41 Zhang, Yanmin
  2005-12-22  6:00 ` Vara Prasad
@ 2006-01-05 11:06 ` Prasanna S Panchamukhi
  1 sibling, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-05 11:06 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> >>+	struct page *page;
> >>+	struct uprobe_module *m;
> >>+	struct uprobe *up = NULL;
> >>+	struct hlist_node *node;
> >>+
> >>+	m = get_module_by_inode(file->f_dentry->d_inode);
> There is a race condition between unregister_userspace_probe and here.
> If a thread jumps to the beginning of function up_readpages while
> another thread calls unregister_userspace_probe to delete the um, the
> first thread might return error.

Some locking/semaphore can be used to avoid this. 

> >>+						lock_page(up->page);
> The first patch doesn't do lock_page before calling insert_probe_page.
> Why does this patch do so? It's inconsistent.
> 

Next patch release will take care of this.

> >>+	int kprobe_page_mapped = 0;
> >>+	struct hlist_node *node;
> >>+
> >>+	m = get_module_by_inode(file->f_dentry->d_inode);
> The same race condition like above function.
> 

Some locking/semaphore can be used to avoid this. 
> PAGE_CACHE_SHIFT,
> >>+					page->index <<
> PAGE_CACHE_SHIFT)) {
> >>+				up->page = page;
> >>+				if (!map_uprobe_page(up,
> kprobe_page_mapped)) {
> >>+					lock_page(up->page);
> Same inconsistent issue.
> 
> 
> >>+					kprobe_page_mapped = 1;
> >>+					retval = insert_probe_page(up);
> >>+				}
> >>+			}
> >>+		}
> >>+		if (kprobe_page_mapped) {
> The logic here is incorrect. If there are many uprobes at the same page,
> up just points to the last one. How about others? 
> 

readpage() routine reads one page at a time. we map the page one time and walk the 
probes list for this inode, insert all the probes within this page and then unmap it.

Thanks
Prasanna
-- 

Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2005-12-22  5:34 Zhang, Yanmin
@ 2006-01-05 10:30 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-05 10:30 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

> 
> register_aggr_kprobe needs to be aware of user space probe.
> 
this is already taken care off.

-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2005-12-22  5:09 Zhang, Yanmin
@ 2006-01-05 10:29 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-05 10:29 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo


>General question:
>1) How to insert an uprobe at anonymous page (VMA)? I think there are 2 =
>cases related to the question.
>	a) Many applications execute codes produced themselves, such like JIT =
>(Just-In-Time) of JVM.

At present we do not support it, need to look into such a case.

>	b) Some executables include TEXTREL section. When they are loaded into =
>memory and linked dynamically, the text section might be changed, and =
>kernel will do a Copy-On-Write to create a new anonymous page and map =
>the new page to the process address space. So after the process starts, =
>we couldn't insert uprobe on its copied pages.
>Should a new interface be added to support it? The parameters could be =
>process id and offset in the process address space. Of course, it could =
>be an enhancement and implemented later.

User space probes support insertion of probes on dynamically linked libraries 
and even probes can be inserted on the text pages that are not even loaded
into the memory.

>3) Can function register_userspace_probe do not call register_kprobe? I =
>think it's not necessary. It's just my feeling. It's up to you to make =
>decision. :)

register_kprobe already does most of what userspace probe registeration needs.

>2) Function get_inode_ops should take care of errors and its caller, =
>register_userspace_probe should check if the return value of =
>get_inode_ops is IS_ERR. If so, the error code should be returned =
>instead of a hard-coded -ENOSYS.

Next patch release will take care of these things.

>>>>+			spin_unlock(&mapping->i_mmap_lock);
>>>>+			return vma;
>It's not safe to return vma without lock. There is a race condition. If =
>vma is released by another thread, kernel might be crazy when this =
>thread tries to access it later.

>If the page is mapped to many vma in different processes, function =
>find_get_vma just returns one vma. It's not enough.
>I'd like to suggest to do the flush_icache in the vma_prio_tree_foreach =
>loop.

could you please elaborate this.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2005-12-21  8:31 Zhang, Yanmin
@ 2006-01-05 10:28 ` Prasanna S Panchamukhi
  0 siblings, 0 replies; 32+ messages in thread
From: Prasanna S Panchamukhi @ 2006-01-05 10:28 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

Yanmin,

Thanks for your comments, Please see my comments inline below.

>>>		/*inode of the application */
>>>		struct inode *inode;
>How about using a pointer to uprobe_module to locate the inode? Although
>the storage space is same, it's better to keep all the same info of
>uprobes in uprobe_module. Uprobes just need keep its own specific info.

inode and offset combination uniquely identifies individual user space
probe, hence when a probe is hit, combination inode and offset is used to
identify each probe. Keeping inode pointer in struct uprobe clearly 
reflects this concept.


>>		/*offset within the page */
>This comment is incorrect. Functions, such like
>register_userspace_probe, use it as the offset from the beginning of the
>file, not a page of the file. Right? Although your example uses it as an
>offset of page. Frank's reply said this item could be deleted. I don't
>think he is correct because uprobe->kp.addr is used when calling
>register_kprobe.

Next patch release will take care of this, offset is hidden from the user and user need
not initialize it.


>>		unsigned long offset;=20
>>		/*page containing probes */
>>		struct page *page;
>>		/* virtual memory area containing the probes */
>>		struct vm_area_struct *vma;
>Item page and vma are used temporarily. They are not used when the probe
>is hit. To shrink struct uprobe, they could be deleted. A new super
>struct could be defined to include struct uprobe, or always to search
>them when they are needed. Application might be inserted huge uprobes.
>It's better to keep the struct smaller.

If vma's are temporarily defined, then readpage hooks should walk 
through the vma list to get the vma. To reduce the time spent in
readpage/readpages hooks in fastpath, we store it while registeration.



>>	};
>>
>>	struct uprobe_module {
>>		struct hlist_head ulist_head;
>>		/* hlist head containing list of probes within this
module */=20
>>		struct list_head mlist;
>>		/* list of uprobes_modules */
>>		struct inode *inode;
>Item inode could be deleted. Use uprobe_module->nd.dentry->d_inode.

This will be taken care in the next patch release.


>>		/* inode of the application on which probes are
implanted */
>>		struct nameidata nd;
>>		/* to hold path/dentry etc. */
>Struct nameidata looks a little big. A pointer to dentry might be
>enough.

nameidata is allocated per module/application and not for every probe.
At present the reference count to the mount point is held until 
the probes is unregistered. I will relook at this issue.
 
>>+	if (!kernel_text_address((unsigned long)p->addr))
>>+		addr =3D insert_kprobe_user(p);
>After calling insert_kprobe_user, p->addr is still equal to the return
>value of kmap_atomic. So when the uprobe is hit, how to find it from the
>hash table? 

Here it does not return just the mapped address, but inode * offset as the address. 
Each probe is uniquely identified by the combination of inode and offset, kprobe_handler
does the same thing.

>And I didn't see kprobe_handler is changed to call
>get_kprobe_user if uprobe is hit.

kprobe_handler() calls get_uprobe_at() when the probe is hit.

>>+ */
>>+static int map_uprobe_page(struct uprobe *up, int mapped)
>The second parameter seems useless.

Its being used in the readpage patch. 



>>+{
>>+{
>>+	up->kp.addr =3D (kprobe_opcode_t *) ((unsigned long)up->kp.addr +
>>+				 (unsigned long)(up->offset &
~PAGE_MASK));
Does it guarantee later register_kprobe could distinguish it's a user
probe? Perhaps it's better to assign a fixed address in user space, such
like 0.

Yes,register_kprobe() distinguishes user space probes. I could not 
understand what you mean by "assign a fixed address in user space"

>>+	up->kp.opcode =3D 0;
>It's not necessary to initiate kp.opcode as zero here.

This will be taken care in the next patch release.


>>+	list_for_each_entry(um, &uprobe_module_list, mlist) {
>There is no any lock to protect uprobe_module_list. It's not safe under
>multi-task environment.

This will be fixed in the next release .


>>+		path_release(&nd);
>>+		printk("Failed to lookup the path\n");
>>+		return NULL;
>>+	}
>>+	inode =3D nd.dentry->d_inode;
>>+	path_release(&nd);
>>+	return get_module_by_inode(inode);

>It's better to move path_release after calling get_module_by_inode in
>case of potential race condition.

Later in the patch, we increment the writecount on the inode. get_module_by_inode
just walks the uprobe_module list.

>>+
>>+	um =3D kcalloc(1, sizeof(struct uprobe_module), GFP_KERNEL);
>If kcalloc returns error, it should be checked here.

This will be taken care in the next patch release.

>>+	goto out;
>>+
>>+err:
>>+	kfree(um);
>It looks like you forgot "return NULL" here.

It is taken care off now.


>>+	mapping =3D up->inode->i_mapping;
>>+	up->vma =3D find_get_vma(up->offset, mapping);
>>+	up->page =3D find_get_page(mapping, (up->offset >>
PAGE_CACHE_SHIFT));
>>+
>>+	if (!map_uprobe_page(up, 0)) {
>If map_uprobe_page(up,0) !=3D 0, register_userspace_probe still returns 0
>meaning success register. It's incorrect and might confuse users.
>How about change it to:
>if (!(error=3Dmap_uprobe_page(up, 0))) {

Proper return of error value in case of failure is taken care 
in next patch release.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-25044636

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2006-01-05  7:09 Zhang, Yanmin
  2006-01-05 11:27 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2006-01-05  7:09 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

The 4th patch is to avoid probes misses in smp environment. I just have
some initial comments because the patch is not complete.

See the inline comments below.

Yanmin

>>---
>>
>> linux-2.6.13-prasanna/arch/i386/kernel/kprobes.c |   52
++++++++++++++++++++---
>> 1 files changed, 46 insertions(+), 6 deletions(-)
>>
>>diff -puN arch/i386/kernel/kprobes.c~kprobes_avoid_smp_missprobes
arch/i386/kernel/kprobes.c
>>---
linux-2.6.13/arch/i386/kernel/kprobes.c~kprobes_avoid_smp_missprobes
2005-09-15 13:56:24.105192248 +0530
>>+++ linux-2.6.13-prasanna/arch/i386/kernel/kprobes.c	2005-09-15
13:56:24.137187384 +0530
>>+/*
>>+ * This routine provides the functionality of single stepping out of
line by
>>+ * copying the original instruction in the user process free stack
space.
>>+ */
>>+static inline int uprobe_single_step(struct uprobe *p, struct pt_regs
*regs)
>>+{
>>+	unsigned long page_addr, stack_addr = regs->esp;
>>+	struct vm_area_struct *vma = NULL;
>>+	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>+
>>+	if (!(vma = find_vma(current->mm, PAGE_ALIGN(stack_addr)))) {
>>+		printk("No vma found\n");
>>+		return -ENOENT;
>>+	}
>>+
>>+	if (vma->vm_flags & VM_GROWSDOWN) {
>>+		page_addr = PAGE_ALIGN(stack_addr);
It's wrong to use PAGE_ALIGN here. I think mostly page_addr is bigger
than stack_addr. Pls. use stack_addr&PAGE_MASK.


>>+		if (((stack_addr - sizeof(long long)) - page_addr) >
size) {
Pay attention to the type of stack_add and page_addr. They are unsigned
long. So it's better to change it to:
if ((stack_addr - sizeof(long long)) > (size + page_addr)) {

And why consider a sizeof(long long) here? 


>>+			if (copy_to_user((unsigned long *)(page_addr +
size),
>>+					(unsigned long
*)&p->kp.ainsn.insn,
>>+					 size))
>>+				return -EFAULT;
>>+			regs->eip = page_addr + size;
>>+		}
>>+	} else {
>>+		page_addr = PAGE_ALIGN(stack_addr) + PAGE_SIZE;
Similar issue.


>>+		if ((page_addr - (stack_addr + sizeof(long long))) >
size) {
Similar issue.


>>+			if (copy_to_user((unsigned long *)page_addr,
>>+					(unsigned long
*)&p->kp.ainsn.insn,
>>+\					 size))
>>+				return -EFAULT;
>>+			regs->eip = page_addr;
>>+		}
>>+	}
>>+	singlestep_addr = (kprobe_opcode_t *)regs->eip;
>>+	return 0;
>>+}
>> 
>> static inline void prepare_singlestep(struct kprobe *p, struct
pt_regs *regs)
>> {
>>@@ -140,10 +178,13 @@ static inline void prepare_singlestep(st
>> 	if (p->opcode == BREAKPOINT_INSTRUCTION)
>> 		regs->eip = (unsigned long)p->addr;
>> 	else {
>>-		if (!kernel_text_address((unsigned long)p->addr)) {
>>-			arch_disarm_uprobe(current_uprobe);
>>-			regs->eip = (unsigned long)uprobe_addr;
>>-		} else
>>+		if (!kernel_text_address((unsigned long)p->addr))
>>+			uprobe_single_step(current_uprobe, regs);
1) If uprobe_single_step returns -EFAULT, how does the thread go ahead?
Note the first byte of the original instruction is break now, so the
instruction from the second byte is wrong.
2) If prepare_singlestep succeeds, but later the real instruction of
single step might cause a fatal error, for example, to access illegal
address, then the process will be killed. Would the current
kprobe(uprobe) environment be restored clearly? Such like kprobe_cpu,
kprobe_status and so on.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2005-12-22 13:24 Zhang, Yanmin
  2006-01-05 11:10 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2005-12-22 13:24 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

Below inline are the comments for patch 3/3.

Yanmin

>>Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-handlers
kernel/kprobes.c
>>--- linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-handlers
2005-09-14 11:49:28.780123648 +0530
>>+++ linux-2.6.13-prasanna/kernel/kprobes.c	2005-09-14
11:49:28.835115288 +0530
>>@@ -52,7 +52,8 @@ static struct list_head uprobe_module_li
>>+/*
>>+ * Walk through the kprobe hlist and get the matching userspace probe
>>+ * structure with the given inode and offset.
>>+ */
>>+static struct kprobe *get_uprobe_at(struct inode *inode, unsigned
long offset)
>>+{
>>+	struct hlist_head *head;
>>+	struct hlist_node *node;
>>+	struct kprobe *p;
>>+
>>+	head = &kprobe_table[hash_long((unsigned long)inode * offset,
>>+				       KPROBE_HASH_BITS)];
>>+	hlist_for_each_entry(p, node, head, hlist) {
>>+		if (p->pre_handler == aggr_pre_handler)
>>+			return p;
>>+		else {
>>+			struct uprobe *user = container_of(p,
>>+							struct uprobe,
kp);
Kprobe and uprobe share the same hash table. Does p here always point to
uprobe?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2005-12-22  7:14 Zhang, Yanmin
  0 siblings, 0 replies; 32+ messages in thread
From: Zhang, Yanmin @ 2005-12-22  7:14 UTC (permalink / raw)
  To: Vara Prasad; +Cc: prasanna, systemtap, Keshavamurthy, Anil S, Mao, Bibo

Thanks for your info. I go through the patches 1 by 1. Perhaps later patches already fix the problems in prior patches, so my comments might be incorrect. :)

Yanmin

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Vara Prasad
>>Sent: 2005年12月22日 13:41
>>To: Zhang, Yanmin
>>Cc: prasanna@in.ibm.com; systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>Hi Yanmin,
>>
>>I appreciate your comments. Prasanna the author of the patch is on
>>vacation until the month end, he will address your comments once he is
>>back in Jan. Please feel free to finish the review of the 3rd patch as
>>well in the mean time.
>>
>>Thanks,
>>Vara Prasad
>>
>>Zhang, Yanmin wrote:
>>
>>>Below inline are the comments for patch 2/3.
>>>
>>>Yanmin
>>>
>>>
>>>
>>>>>Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>>>>
>>>>>
>>>>>---
>>>>>
>>>>>linux-2.6.13-prasanna/include/linux/kprobes.h |    2
>>>>>linux-2.6.13-prasanna/kernel/kprobes.c        |  113
>>>>>
>>>>>
>>>++++++++++++++++++++++++++
>>>
>>>
>>>>>2 files changed, 115 insertions(+)
>>>>>
>>>>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-readpages
>>>>>
>>>>>
>>>kernel/kprobes.c
>>>
>>>
>>>>>--- linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-readpages
>>>>>
>>>>>
>>>2005-09-14 11:01:18.495513696 +0530
>>>
>>>
>>>>>+++ linux-2.6.13-prasanna/kernel/kprobes.c	2005-09-14
>>>>>
>>>>>
>>>11:01:18.550505336 +0530
>>>
>>>
>>>>>@@ -652,6 +652,109 @@ static struct uprobe_module *get_module_
>>>>>}
>>>>>
>>>>>/*
>>>>>+ * Check if the given offset lies within given page range.
>>>>>+ */
>>>>>+static int find_page_probe(unsigned long offset, unsigned long
>>>>>
>>>>>
>>>page_start)
>>>
>>>
>>>>>+{
>>>>>+      unsigned long page_end = page_start + PAGE_SIZE;
>>>>>+	if (offset >= page_start && offset < page_end)
>>>>>+		return 1;
>>>>>+	return 0;
>>>>>+}
>>>>>+
>>>>>+/*
>>>>>+ * This function handles the readpages of all modules that have
>>>>>
>>>>>
>>>active probes
>>>
>>>
>>>>>+ * in them. Here, we first call the original readpages() of this
>>>>>+ * inode / address_space to actually read the pages into memory.
>>>>>
>>>>>
>>>Then, we will
>>>
>>>
>>>>>+ * insert all the probes that are specified in this pages before
>>>>>
>>>>>
>>>returning.
>>>
>>>
>>>>>+ */
>>>>>+static int up_readpages(struct file *file, struct address_space
>>>>>
>>>>>
>>>*mapping,
>>>
>>>
>>>>>+			struct list_head *pages, unsigned nr_pages)
>>>>>+{
>>>>>+	int retval = 0;
>>>>>+	struct page *page;
>>>>>+	struct uprobe_module *m;
>>>>>+	struct uprobe *up = NULL;
>>>>>+	struct hlist_node *node;
>>>>>+
>>>>>+	m = get_module_by_inode(file->f_dentry->d_inode);
>>>>>
>>>>>
>>>There is a race condition between unregister_userspace_probe and here.
>>>If a thread jumps to the beginning of function up_readpages while
>>>another thread calls unregister_userspace_probe to delete the um, the
>>>first thread might return error.
>>>
>>>
>>>
>>>
>>>>>+	if (!m) {
>>>>>+		printk("up_readpages: major problem. we don't  \
>>>>>+						have mod for this
>>>>>
>>>>>
>>>!!!\n");
>>>
>>>
>>>>>+		return -EINVAL;
>>>>>+	}
>>>>>+
>>>>>+	/* call original readpages() */
>>>>>+	retval = m->ori_a_ops->readpages(file, mapping, pages,
>>>>>
>>>>>
>>>nr_pages);
>>>
>>>
>>>>>+	if (retval >= 0) {
>>>>>+		hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>>>>+			/*
>>>>>+			 * TODO: Walk through readpages page list and
>>>>>
>>>>>
>>>get
>>>
>>>
>>>>>+			 * pages with probes instead of find_get_page().
>>>>>+			 */
>>>>>+			if ((page = find_get_page(mapping,
>>>>>+				up->offset >> PAGE_CACHE_SHIFT)) !=
>>>>>
>>>>>
>>>NULL) {
>>>
>>>
>>>>>+				if (find_page_probe
>>>>>+				    (up->offset >> PAGE_CACHE_SHIFT,
>>>>>+				     page->index << PAGE_CACHE_SHIFT)) {
>>>>>+					up->page = page;
>>>>>+					if (!map_uprobe_page(up, 0)) {
>>>>>+						lock_page(up->page);
>>>>>
>>>>>
>>>The first patch doesn't do lock_page before calling insert_probe_page.
>>>Why does this patch do so? It's inconsistent.
>>>
>>>
>>>
>>>
>>>>>+						insert_probe_page(up);
>>>>>+						unmap_uprobe_page(up);
>>>>>+						unlock_page(up->page);
>>>>>+					}
>>>>>+				}
>>>>>+				page_cache_release(up->page);
>>>>>+			}
>>>>>+		}
>>>>>+	}
>>>>>+	return retval;
>>>>>+}
>>>>>+
>>>>>+/*
>>>>>+ * This function handles the readpage of all modules that have active
>>>>>
>>>>>
>>>probes
>>>
>>>
>>>>>+ * in them. Here, we first call the original readpage() of this
>>>>>+ * inode / address_space to actually read the page into memory. Then,
>>>>>
>>>>>
>>>we will
>>>
>>>
>>>>>+ * insert all the probes that are specified in this page before
>>>>>
>>>>>
>>>returning.
>>>
>>>
>>>>>+ */
>>>>>+int up_readpage(struct file *file, struct page *page)
>>>>>+{
>>>>>+	int retval = 0;
>>>>>+	struct uprobe_module *m;
>>>>>+	struct uprobe *up = NULL;
>>>>>+	int kprobe_page_mapped = 0;
>>>>>+	struct hlist_node *node;
>>>>>+
>>>>>+	m = get_module_by_inode(file->f_dentry->d_inode);
>>>>>
>>>>>
>>>The same race condition like above function.
>>>
>>>
>>>
>>>
>>>>>+	if (!m) {
>>>>>+		printk("up_readpage: major problem. we don't have mod
>>>>>
>>>>>
>>>for this !!!\n");
>>>
>>>
>>>>>+		return -EINVAL;
>>>>>+	}
>>>>>+
>>>>>+	/* call original readpage() */
>>>>>+	retval = m->ori_a_ops->readpage(file, page);
>>>>>+	if (retval >= 0) {
>>>>>+		hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>>>>+			if (find_page_probe (up->offset >>
>>>>>
>>>>>
>>>PAGE_CACHE_SHIFT,
>>>
>>>
>>>>>+					page->index <<
>>>>>
>>>>>
>>>PAGE_CACHE_SHIFT)) {
>>>
>>>
>>>>>+				up->page = page;
>>>>>+				if (!map_uprobe_page(up,
>>>>>
>>>>>
>>>kprobe_page_mapped)) {
>>>
>>>
>>>>>+					lock_page(up->page);
>>>>>
>>>>>
>>>Same inconsistent issue.
>>>
>>>
>>>
>>>
>>>>>+					kprobe_page_mapped = 1;
>>>>>+					retval = insert_probe_page(up);
>>>>>+				}
>>>>>+			}
>>>>>+		}
>>>>>+		if (kprobe_page_mapped) {
>>>>>
>>>>>
>>>The logic here is incorrect. If there are many uprobes at the same page,
>>>up just points to the last one. How about others?
>>>
>>>
>>>
>>>
>>>
>>>>>+			unmap_uprobe_page(up);
>>>>>+			unlock_page(up->page);
>>>>>+		}
>>>>>+	}
>>>>>+	return retval;
>>>>>+}
>>>>>+
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Review patches of user space kprobe
  2005-12-22  5:41 Zhang, Yanmin
@ 2005-12-22  6:00 ` Vara Prasad
  2006-01-05 11:06 ` Prasanna S Panchamukhi
  1 sibling, 0 replies; 32+ messages in thread
From: Vara Prasad @ 2005-12-22  6:00 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: prasanna, systemtap, Keshavamurthy, Anil S, Mao, Bibo

Hi Yanmin,

I appreciate your comments. Prasanna the author of the patch is on 
vacation until the month end, he will address your comments once he is 
back in Jan. Please feel free to finish the review of the 3rd patch as 
well in the mean time.

Thanks,
Vara Prasad

Zhang, Yanmin wrote:

>Below inline are the comments for patch 2/3.
>
>Yanmin
>
>  
>
>>>Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>>
>>>
>>>---
>>>
>>>linux-2.6.13-prasanna/include/linux/kprobes.h |    2 
>>>linux-2.6.13-prasanna/kernel/kprobes.c        |  113
>>>      
>>>
>++++++++++++++++++++++++++
>  
>
>>>2 files changed, 115 insertions(+)
>>>
>>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-readpages
>>>      
>>>
>kernel/kprobes.c
>  
>
>>>--- linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-readpages
>>>      
>>>
>2005-09-14 11:01:18.495513696 +0530
>  
>
>>>+++ linux-2.6.13-prasanna/kernel/kprobes.c	2005-09-14
>>>      
>>>
>11:01:18.550505336 +0530
>  
>
>>>@@ -652,6 +652,109 @@ static struct uprobe_module *get_module_
>>>}
>>>
>>>/*
>>>+ * Check if the given offset lies within given page range.
>>>+ */
>>>+static int find_page_probe(unsigned long offset, unsigned long
>>>      
>>>
>page_start)
>  
>
>>>+{
>>>+      unsigned long page_end = page_start + PAGE_SIZE;
>>>+	if (offset >= page_start && offset < page_end)
>>>+		return 1;
>>>+	return 0;
>>>+}
>>>+
>>>+/*
>>>+ * This function handles the readpages of all modules that have
>>>      
>>>
>active probes
>  
>
>>>+ * in them. Here, we first call the original readpages() of this
>>>+ * inode / address_space to actually read the pages into memory.
>>>      
>>>
>Then, we will
>  
>
>>>+ * insert all the probes that are specified in this pages before
>>>      
>>>
>returning.
>  
>
>>>+ */
>>>+static int up_readpages(struct file *file, struct address_space
>>>      
>>>
>*mapping,
>  
>
>>>+			struct list_head *pages, unsigned nr_pages)
>>>+{
>>>+	int retval = 0;
>>>+	struct page *page;
>>>+	struct uprobe_module *m;
>>>+	struct uprobe *up = NULL;
>>>+	struct hlist_node *node;
>>>+
>>>+	m = get_module_by_inode(file->f_dentry->d_inode);
>>>      
>>>
>There is a race condition between unregister_userspace_probe and here.
>If a thread jumps to the beginning of function up_readpages while
>another thread calls unregister_userspace_probe to delete the um, the
>first thread might return error.
>
>
>  
>
>>>+	if (!m) {
>>>+		printk("up_readpages: major problem. we don't  \
>>>+						have mod for this
>>>      
>>>
>!!!\n");
>  
>
>>>+		return -EINVAL;
>>>+	}
>>>+
>>>+	/* call original readpages() */
>>>+	retval = m->ori_a_ops->readpages(file, mapping, pages,
>>>      
>>>
>nr_pages);
>  
>
>>>+	if (retval >= 0) {
>>>+		hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>>+			/*
>>>+			 * TODO: Walk through readpages page list and
>>>      
>>>
>get
>  
>
>>>+			 * pages with probes instead of find_get_page().
>>>+			 */
>>>+			if ((page = find_get_page(mapping,
>>>+				up->offset >> PAGE_CACHE_SHIFT)) !=
>>>      
>>>
>NULL) {
>  
>
>>>+				if (find_page_probe
>>>+				    (up->offset >> PAGE_CACHE_SHIFT,
>>>+				     page->index << PAGE_CACHE_SHIFT)) {
>>>+					up->page = page;
>>>+					if (!map_uprobe_page(up, 0)) {
>>>+						lock_page(up->page);
>>>      
>>>
>The first patch doesn't do lock_page before calling insert_probe_page.
>Why does this patch do so? It's inconsistent.
>
>
>  
>
>>>+						insert_probe_page(up);
>>>+						unmap_uprobe_page(up);
>>>+						unlock_page(up->page);
>>>+					}
>>>+				}
>>>+				page_cache_release(up->page);
>>>+			}
>>>+		}
>>>+	}
>>>+	return retval;
>>>+}
>>>+
>>>+/*
>>>+ * This function handles the readpage of all modules that have active
>>>      
>>>
>probes
>  
>
>>>+ * in them. Here, we first call the original readpage() of this
>>>+ * inode / address_space to actually read the page into memory. Then,
>>>      
>>>
>we will
>  
>
>>>+ * insert all the probes that are specified in this page before
>>>      
>>>
>returning.
>  
>
>>>+ */
>>>+int up_readpage(struct file *file, struct page *page)
>>>+{
>>>+	int retval = 0;
>>>+	struct uprobe_module *m;
>>>+	struct uprobe *up = NULL;
>>>+	int kprobe_page_mapped = 0;
>>>+	struct hlist_node *node;
>>>+
>>>+	m = get_module_by_inode(file->f_dentry->d_inode);
>>>      
>>>
>The same race condition like above function.
>
>
>  
>
>>>+	if (!m) {
>>>+		printk("up_readpage: major problem. we don't have mod
>>>      
>>>
>for this !!!\n");
>  
>
>>>+		return -EINVAL;
>>>+	}
>>>+
>>>+	/* call original readpage() */
>>>+	retval = m->ori_a_ops->readpage(file, page);
>>>+	if (retval >= 0) {
>>>+		hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>>+			if (find_page_probe (up->offset >>
>>>      
>>>
>PAGE_CACHE_SHIFT,
>  
>
>>>+					page->index <<
>>>      
>>>
>PAGE_CACHE_SHIFT)) {
>  
>
>>>+				up->page = page;
>>>+				if (!map_uprobe_page(up,
>>>      
>>>
>kprobe_page_mapped)) {
>  
>
>>>+					lock_page(up->page);
>>>      
>>>
>Same inconsistent issue.
>
>
>  
>
>>>+					kprobe_page_mapped = 1;
>>>+					retval = insert_probe_page(up);
>>>+				}
>>>+			}
>>>+		}
>>>+		if (kprobe_page_mapped) {
>>>      
>>>
>The logic here is incorrect. If there are many uprobes at the same page,
>up just points to the last one. How about others? 
>
>
>
>  
>
>>>+			unmap_uprobe_page(up);
>>>+			unlock_page(up->page);
>>>+		}
>>>+	}
>>>+	return retval;
>>>+}
>>>+
>>>      
>>>
>
>
>  
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2005-12-22  5:41 Zhang, Yanmin
  2005-12-22  6:00 ` Vara Prasad
  2006-01-05 11:06 ` Prasanna S Panchamukhi
  0 siblings, 2 replies; 32+ messages in thread
From: Zhang, Yanmin @ 2005-12-22  5:41 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

Below inline are the comments for patch 2/3.

Yanmin

>>
>>Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>
>>
>>---
>>
>> linux-2.6.13-prasanna/include/linux/kprobes.h |    2 
>> linux-2.6.13-prasanna/kernel/kprobes.c        |  113
++++++++++++++++++++++++++
>> 2 files changed, 115 insertions(+)
>>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-readpages
kernel/kprobes.c
>>--- linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-readpages
2005-09-14 11:01:18.495513696 +0530
>>+++ linux-2.6.13-prasanna/kernel/kprobes.c	2005-09-14
11:01:18.550505336 +0530
>>@@ -652,6 +652,109 @@ static struct uprobe_module *get_module_
>> }
>> 
>> /*
>>+ * Check if the given offset lies within given page range.
>>+ */
>>+static int find_page_probe(unsigned long offset, unsigned long
page_start)
>>+{
>>+      unsigned long page_end = page_start + PAGE_SIZE;
>>+	if (offset >= page_start && offset < page_end)
>>+		return 1;
>>+	return 0;
>>+}
>>+
>>+/*
>>+ * This function handles the readpages of all modules that have
active probes
>>+ * in them. Here, we first call the original readpages() of this
>>+ * inode / address_space to actually read the pages into memory.
Then, we will
>>+ * insert all the probes that are specified in this pages before
returning.
>>+ */
>>+static int up_readpages(struct file *file, struct address_space
*mapping,
>>+			struct list_head *pages, unsigned nr_pages)
>>+{
>>+	int retval = 0;
>>+	struct page *page;
>>+	struct uprobe_module *m;
>>+	struct uprobe *up = NULL;
>>+	struct hlist_node *node;
>>+
>>+	m = get_module_by_inode(file->f_dentry->d_inode);
There is a race condition between unregister_userspace_probe and here.
If a thread jumps to the beginning of function up_readpages while
another thread calls unregister_userspace_probe to delete the um, the
first thread might return error.


>>+	if (!m) {
>>+		printk("up_readpages: major problem. we don't  \
>>+						have mod for this
!!!\n");
>>+		return -EINVAL;
>>+	}
>>+
>>+	/* call original readpages() */
>>+	retval = m->ori_a_ops->readpages(file, mapping, pages,
nr_pages);
>>+	if (retval >= 0) {
>>+		hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>+			/*
>>+			 * TODO: Walk through readpages page list and
get
>>+			 * pages with probes instead of find_get_page().
>>+			 */
>>+			if ((page = find_get_page(mapping,
>>+				up->offset >> PAGE_CACHE_SHIFT)) !=
NULL) {
>>+				if (find_page_probe
>>+				    (up->offset >> PAGE_CACHE_SHIFT,
>>+				     page->index << PAGE_CACHE_SHIFT)) {
>>+					up->page = page;
>>+					if (!map_uprobe_page(up, 0)) {
>>+						lock_page(up->page);
The first patch doesn't do lock_page before calling insert_probe_page.
Why does this patch do so? It's inconsistent.


>>+						insert_probe_page(up);
>>+						unmap_uprobe_page(up);
>>+						unlock_page(up->page);
>>+					}
>>+				}
>>+				page_cache_release(up->page);
>>+			}
>>+		}
>>+	}
>>+	return retval;
>>+}
>>+
>>+/*
>>+ * This function handles the readpage of all modules that have active
probes
>>+ * in them. Here, we first call the original readpage() of this
>>+ * inode / address_space to actually read the page into memory. Then,
we will
>>+ * insert all the probes that are specified in this page before
returning.
>>+ */
>>+int up_readpage(struct file *file, struct page *page)
>>+{
>>+	int retval = 0;
>>+	struct uprobe_module *m;
>>+	struct uprobe *up = NULL;
>>+	int kprobe_page_mapped = 0;
>>+	struct hlist_node *node;
>>+
>>+	m = get_module_by_inode(file->f_dentry->d_inode);
The same race condition like above function.


>>+	if (!m) {
>>+		printk("up_readpage: major problem. we don't have mod
for this !!!\n");
>>+		return -EINVAL;
>>+	}
>>+
>>+	/* call original readpage() */
>>+	retval = m->ori_a_ops->readpage(file, page);
>>+	if (retval >= 0) {
>>+		hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>+			if (find_page_probe (up->offset >>
PAGE_CACHE_SHIFT,
>>+					page->index <<
PAGE_CACHE_SHIFT)) {
>>+				up->page = page;
>>+				if (!map_uprobe_page(up,
kprobe_page_mapped)) {
>>+					lock_page(up->page);
Same inconsistent issue.


>>+					kprobe_page_mapped = 1;
>>+					retval = insert_probe_page(up);
>>+				}
>>+			}
>>+		}
>>+		if (kprobe_page_mapped) {
The logic here is incorrect. If there are many uprobes at the same page,
up just points to the last one. How about others? 



>>+			unmap_uprobe_page(up);
>>+			unlock_page(up->page);
>>+		}
>>+	}
>>+	return retval;
>>+}
>>+

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2005-12-22  5:34 Zhang, Yanmin
  2006-01-05 10:30 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2005-12-22  5:34 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

One more comment for patch 1/3.

register_aggr_kprobe needs to be aware of user space probe.

Yanmin

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Zhang, Yanmin
>>Sent: 2005年12月22日 11:37
>>To: prasanna@in.ibm.com
>>Cc: systemtap@sources.redhat.com; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: RE: Review patches of user space kprobe
>>
>>General question:
>>1) How to insert an uprobe at anonymous page (VMA)? I think there are 2 cases related to the question.
>>	a) Many applications execute codes produced themselves, such like JIT (Just-In-Time) of JVM.
>>	b) Some executables include TEXTREL section. When they are loaded into memory and linked dynamically, the text section might be
>>changed, and kernel will do a Copy-On-Write to create a new anonymous page and map the new page to the process address space. So after
>>the process starts, we couldn't insert uprobe on its copied pages.
>>Should a new interface be added to support it? The parameters could be process id and offset in the process address space. Of course,
>>it could be an enhancement and implemented later.
>>
>>3) Can function register_userspace_probe do not call register_kprobe? I think it's not necessary. It's just my feeling. It's up to you
>>to make decision. :)
>>
>>2) Function get_inode_ops should take care of errors and its caller, register_userspace_probe should check if the return value of
>>get_inode_ops is IS_ERR. If so, the error code should be returned instead of a hard-coded -ENOSYS.
>>
>>Other more comments against patch 1/3 are inline below.
>>
>>Yanmin
>>
>>>>-----Original Message-----
>>>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Zhang, Yanmin
>>>>Sent: 2005年12月21日 16:30
>>>>To: prasanna@in.ibm.com
>>>>>>+static struct vm_area_struct *find_get_vma(unsigned long offset,
>>>>>>+					   struct address_space
>>>>*mapping)
>>>>>>+{
>>>>>>+	struct vm_area_struct *vma = NULL;
>>>>>>+	struct prio_tree_iter iter;
>>>>>>+	struct prio_tree_root *head = &mapping->i_mmap;
>>>>>>+	struct mm_struct *mm;
>>>>>>+	unsigned long start, end;
>>>>>>+
>>>>>>+	spin_lock(&mapping->i_mmap_lock);
>>>>>>+	vma_prio_tree_foreach(vma, &iter, head, offset, offset) {
>>>>>>+		mm = vma->vm_mm;
>>>>>>+		spin_lock(&mm->page_table_lock);
>>>>>>+		start = vma->vm_start - (vma->vm_pgoff << PAGE_SHIFT);
>>>>>>+		end = vma->vm_end - (vma->vm_pgoff << PAGE_SHIFT);
>>>>>>+		spin_unlock(&mm->page_table_lock);
>>>>>>+		if ((start + offset) < end) {
>>>>>>+			spin_unlock(&mapping->i_mmap_lock);
>>>>>>+			return vma;
>>It's not safe to return vma without lock. There is a race condition. If vma is released by another thread, kernel might be crazy when
>>this thread tries to access it later.
>>
>>If the page is mapped to many vma in different processes, function find_get_vma just returns one vma. It's not enough.
>>
>>I'd like to suggest to do the flush_icache in the vma_prio_tree_foreach loop.
>>
>>
>>>>>>+		}
>>>>>>+	}
>>>>>>+	spin_unlock(&mapping->i_mmap_lock);
>>>>>>+	return NULL;
>>>>>>+}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2005-12-22  5:09 Zhang, Yanmin
  2006-01-05 10:29 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2005-12-22  5:09 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

General question:
1) How to insert an uprobe at anonymous page (VMA)? I think there are 2 cases related to the question.
	a) Many applications execute codes produced themselves, such like JIT (Just-In-Time) of JVM.
	b) Some executables include TEXTREL section. When they are loaded into memory and linked dynamically, the text section might be changed, and kernel will do a Copy-On-Write to create a new anonymous page and map the new page to the process address space. So after the process starts, we couldn't insert uprobe on its copied pages.
Should a new interface be added to support it? The parameters could be process id and offset in the process address space. Of course, it could be an enhancement and implemented later.

3) Can function register_userspace_probe do not call register_kprobe? I think it's not necessary. It's just my feeling. It's up to you to make decision. :)

2) Function get_inode_ops should take care of errors and its caller, register_userspace_probe should check if the return value of get_inode_ops is IS_ERR. If so, the error code should be returned instead of a hard-coded -ENOSYS.

Other more comments against patch 1/3 are inline below.

Yanmin

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Zhang, Yanmin
>>Sent: 2005年12月21日 16:30
>>To: prasanna@in.ibm.com
>>>>+static struct vm_area_struct *find_get_vma(unsigned long offset,
>>>>+					   struct address_space
>>*mapping)
>>>>+{
>>>>+	struct vm_area_struct *vma = NULL;
>>>>+	struct prio_tree_iter iter;
>>>>+	struct prio_tree_root *head = &mapping->i_mmap;
>>>>+	struct mm_struct *mm;
>>>>+	unsigned long start, end;
>>>>+
>>>>+	spin_lock(&mapping->i_mmap_lock);
>>>>+	vma_prio_tree_foreach(vma, &iter, head, offset, offset) {
>>>>+		mm = vma->vm_mm;
>>>>+		spin_lock(&mm->page_table_lock);
>>>>+		start = vma->vm_start - (vma->vm_pgoff << PAGE_SHIFT);
>>>>+		end = vma->vm_end - (vma->vm_pgoff << PAGE_SHIFT);
>>>>+		spin_unlock(&mm->page_table_lock);
>>>>+		if ((start + offset) < end) {
>>>>+			spin_unlock(&mapping->i_mmap_lock);
>>>>+			return vma;
It's not safe to return vma without lock. There is a race condition. If vma is released by another thread, kernel might be crazy when this thread tries to access it later.

If the page is mapped to many vma in different processes, function find_get_vma just returns one vma. It's not enough.

I'd like to suggest to do the flush_icache in the vma_prio_tree_foreach loop.


>>>>+		}
>>>>+	}
>>>>+	spin_unlock(&mapping->i_mmap_lock);
>>>>+	return NULL;
>>>>+}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: Review patches of user space kprobe
@ 2005-12-21  8:31 Zhang, Yanmin
  2006-01-05 10:28 ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yanmin @ 2005-12-21  8:31 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap, Keshavamurthy, Anil S, Mao, Bibo

Parsanna,

I read the patches and have some comments. Generally, I think user space
kprobe is more useful than kernel kprobe, and it will attract far more
people who work on applications.

I wrote down inline comments for patch 1/3 below.

Yanmin

>>This patch adds two new interfaces to insert and remove userspace
probes.
>>This new interface uses two userspace probe objects.
>>Objects:
>>-------
>>Each user space probes is uniquely identified by the 
>>combination of inode and offset. New user space probes defines
>>two objects
>>1. struct uprobe - per probe
>>2. struct uprobe_module - per application that has probes on 
>>   it. This is allocated internally per application.
>>
>>	
>>	struct uprobe {
>>		/*path of the application */
>>		char name[50]; 
>>		/*includes kprobe structure */
>>		struct kprobe kp;
>>		/*list of probe per module */
>>		struct hlist_node ulist;
>>		/*inode of the application */
>>		struct inode *inode;
How about using a pointer to uprobe_module to locate the inode? Although
the storage space is same, it's better to keep all the same info of
uprobes in uprobe_module. Uprobes just need keep its own specific info.


>>		/*offset within the page */
This comment is incorrect. Functions, such like
register_userspace_probe, use it as the offset from the beginning of the
file, not a page of the file. Right? Although your example uses it as an
offset of page. Frank's reply said this item could be deleted. I don't
think he is correct because uprobe->kp.addr is used when calling
register_kprobe.


>>		unsigned long offset; 
>>		/*page containing probes */
>>		struct page *page;
>>		/* virtual memory area containing the probes */
>>		struct vm_area_struct *vma;
Item page and vma are used temporarily. They are not used when the probe
is hit. To shrink struct uprobe, they could be deleted. A new super
struct could be defined to include struct uprobe, or always to search
them when they are needed. Application might be inserted huge uprobes.
It's better to keep the struct smaller.


>>	};
>>
>>	struct uprobe_module {
>>		struct hlist_head ulist_head;
>>		/* hlist head containing list of probes within this
module */ 
>>		struct list_head mlist;
>>		/* list of uprobes_modules */
>>		struct inode *inode;
Item inode could be deleted. Use uprobe_module->nd.dentry->d_inode.


>>		/* inode of the application on which probes are
implanted */
>>		struct nameidata nd;
>>		/* to hold path/dentry etc. */
Struct nameidata looks a little big. A pointer to dentry might be
enough.



>>	};
>>
>>Interfaces :
>>------------
>>Two new interfaces are defined to insert and remove user 
>>space probes
>>register_user space_probe(struct uprobe *);
>>unregister_user space_probe(struct uprobe *);
>>	 
>>register_user space_probe() accepts pointer to struct uprobe.
>>User has to allocate uprobes structure and initialize following 
>>elements
>>	name		- contains the path of the application;
>>	offset		- offset of the probe within the page;
>>	kp.addr 	- virtual address within the executable.
>>	kp.pre_handler 	- handler to be executed when probe is fired.
>>	kp.post_handler - handler to be executed after single stepping 
>>			  the original instruction.
>>	kp.fault_handler- handler to be executed if fault occurs while 
>>			  executing the original instruction or the 
>>			  handlers.
>>
>>page and vma are used internally by register_user space_probe().	
>>	
>>unregister_user space_probe(struct uprobe *);
>>
>>register_user space_probe() first walks through the given path of the 
>>application and gets the inode of the application where the probes are

>>to be inserted. Struct uprobe_module is allocated for the each 
>>application on which probes are to be inserted. If the probes are to 
>>be inserted on the application which has probes earlier on them, then 
>>the same struct uprobe_module is used. Also uprobe structure is added 
>>to ulist_head within the uprobe_module. Then walk through the list of 
>>process private mappings and get the virtual memory area containing
the
>>offset. Get the reference to the hashed page containing the offset. If

>>the page is found, it should be checked if it is uptodate and lock the

>>page in the memory so that the page is not swapped out. Now map the 
>>page; insert the probe and then unmap the page. Release the page lock.

>>Thus probes are inserted in the pages that are already read into the 
>>memory at the time of registration.
>>
>>unregister_user space_probe() get the vma and the page containing the 
>>offset of the user space probes. Maps the page containing the probes 
>>before replacing the int3/breakpoint instruction with the original 
>>instruction and then unmap the page. Remove the uprobe structure from 
>>ulist_head and check if the uprobe is the last one within this module 
>>to be unregistered. If the uprobe is the last one within the
application
>>to be unregistered, remove the uprobe_module from the 
>>uprobe_module_list. If there are no references to the uprobe_module 
>>struct free it.
>>
>>Usage:
>>Usage is similar to kprobe.
>>	/* Allocate a uprobe structure */
>>	struct uprobe p;
>>
>>	/* Define pre handler */
>>	int handler_pre(struct kprobe *p, struct pt_regs *regs)
>>	{
>>		<.............collect useful data..............>
>>	}
>>
>>	void handler_post(struct kprobe *p, struct pt_regs *regs, 
>>							unsigned long
flags)
>>	{
>>		<.............collect useful data..............>
>>	}
>>
>>	int handler_fault(struct kprobe *p, struct pt_regs *regs, int
trapnr)
>>	{
>>		<.............collect useful data..............>
>>	}
>>
>>	While instering the probe, specify the patchname of the
application
>>	on which probes are to be inserted.
>>
>>	char pname[] ="/tmp/MOD/appln";
>>
>>	/* pre_handler */
>>	p.kp.pre_handler=handler_pre;
>>	/* pre_handler */
>>	p.kp.post_handler=handler_post;
>>	/* pre_handler */
>>	p.kp.fault_handler=handler_fault;
>>	/* Secify the offset within the page*/
>>	p.offset = (kprobe_opcode_t *)0x4d4;
>>	/* Secify the address within the application/executable */
>>	/* $nm appln |grep func1 */
>>	p.kp.addr = (kprobe_opcode_t *)0x80484d4;
>>	/* copy the string name to p.name */
>>	strcpy(p.name, pname);
>>	/* Now registe the userspace probe */
>>	register_userspace_probe(&p);
>>
>>
>>	/* To unregister the registered probed, just call..*/
>>	unregister_userspace_probe(&p);
>>
>>Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>
>>
>>---
>>
>> linux-2.6.13-prasanna/arch/i386/kernel/kprobes.c |   14 +
>> linux-2.6.13-prasanna/fs/namei.c                 |   12 
>> linux-2.6.13-prasanna/include/linux/kprobes.h    |   25 +
>> linux-2.6.13-prasanna/include/linux/namei.h      |    1 
>> linux-2.6.13-prasanna/kernel/kprobes.c           |  303
++++++++++++++++++++++-
>> 5 files changed, 344 insertions(+), 11 deletions(-)
>>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-newinterface
kernel/kprobes.c
>>---
linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-newinterface
2005-09-14 11:00:19.987408280 +0530
>>+++ linux-2.6.13-prasanna/kernel/kprobes.c	2005-09-14
11:00:53.943246208 +0530
>>@@ -37,6 +37,7 @@
>> #include <linux/init.h>
>> #include <linux/module.h>
>> #include <linux/moduleloader.h>
>>+#include <linux/namei.h>
>> #include <asm/cacheflush.h>
>> #include <asm/errno.h>
>> #include <asm/kdebug.h>
>>@@ -46,6 +47,7 @@
>> 
>> static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
>> static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
>>+static struct list_head uprobe_module_list;
>> 
>> unsigned int kprobe_cpu = NR_CPUS;
>> static DEFINE_SPINLOCK(kprobe_lock);
>>@@ -417,7 +419,11 @@ static int register_aggr_kprobe(struct k
>> /* kprobe removal house-keeping routines */
>> static inline void cleanup_kprobe(struct kprobe *p, unsigned long
flags)
>> {
>>-	arch_disarm_kprobe(p);
>>+	if (!kernel_text_address((unsigned long)p->addr)) {
>>+		struct uprobe *up = container_of(p, struct uprobe, kp);
>>+		arch_disarm_uprobe(up);
>>+	} else
>>+		arch_disarm_kprobe(p);
>> 	hlist_del(&p->hlist);
>> 	spin_unlock_irqrestore(&kprobe_lock, flags);
>> 	arch_remove_kprobe(p);
>>@@ -434,30 +440,71 @@ static inline void cleanup_aggr_kprobe(s
>> 		spin_unlock_irqrestore(&kprobe_lock, flags);
>> }
>> 
>>+/*
>>+ * This routine check if the probe already exits at the given offset
and inode.
>>+ * Returns the maching pointer, if found.
>>+ */
>>+static struct kprobe *get_kprobe_user(struct kprobe *kp)
>>+{
>>+	struct uprobe *up = container_of(kp, struct uprobe, kp);
>>+	struct hlist_head *head;
>>+	struct hlist_node *node;
>>+	struct kprobe *p;
>>+
>>+	head = &kprobe_table[hash_long((unsigned long)up->inode *
up->offset,
>>+				       KPROBE_HASH_BITS)];
>>+	hlist_for_each_entry(p, node, head, hlist) {
>>+		struct uprobe *user = container_of(p, struct uprobe,
kp);
>>+		if (user->inode == up->inode && user->offset ==
up->offset)
>>+			return p;
>>+	}
>>+	return NULL;
>>+}
>>+
>>+static kprobe_opcode_t *insert_kprobe_user(struct kprobe *p)
>>+{
>>+	kprobe_opcode_t *addr;
>>+	struct uprobe *up = container_of(p, struct uprobe, kp);
>>+	addr = (kprobe_opcode_t *)
>>+		((unsigned long)(up->inode) * (unsigned
long)(up->offset));
>>+	if ((up->page) && (PageUptodate(up->page))) {
>>+		arch_copy_kprobe(p);
>>+		arch_arm_uprobe(up);
>>+	}
>>+	return addr;
>>+}
>>+
>> int register_kprobe(struct kprobe *p)
>> {
>> 	int ret = 0;
>> 	unsigned long flags = 0;
>> 	struct kprobe *old_p;
>>+	kprobe_opcode_t *addr;
>> 
>> 	if ((ret = arch_prepare_kprobe(p)) != 0) {
>> 		goto rm_kprobe;
>> 	}
>> 	spin_lock_irqsave(&kprobe_lock, flags);
>>-	old_p = get_kprobe(p->addr);
>>+	if (!kernel_text_address((unsigned long)p->addr))
>>+		old_p = get_kprobe_user(p);
>>+	else
>>+		old_p = get_kprobe(p->addr);
>> 	p->nmissed = 0;
>> 	if (old_p) {
>> 		ret = register_aggr_kprobe(old_p, p);
>> 		goto out;
>> 	}
>> 
>>-	arch_copy_kprobe(p);
>> 	INIT_HLIST_NODE(&p->hlist);
>>-	hlist_add_head(&p->hlist,
>>-		       &kprobe_table[hash_ptr(p->addr,
KPROBE_HASH_BITS)]);
>>-
>>-  	arch_arm_kprobe(p);
>>-
>>+	if (!kernel_text_address((unsigned long)p->addr))
>>+		addr = insert_kprobe_user(p);
After calling insert_kprobe_user, p->addr is still equal to the return
value of kmap_atomic. So when the uprobe is hit, how to find it from the
hash table? And I didn't see kprobe_handler is changed to call
get_kprobe_user if uprobe is hit.



>>+	else {
>>+		addr = p->addr;
>>+		arch_copy_kprobe(p);
>>+  		arch_arm_kprobe(p);
>>+	}
>>+  	hlist_add_head(&p->hlist,
>>+		       &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)]);
>> out:
>> 	spin_unlock_irqrestore(&kprobe_lock, flags);
>> rm_kprobe:
>>@@ -472,7 +519,10 @@ void unregister_kprobe(struct kprobe *p)
>> 	struct kprobe *old_p;
>> 
>> 	spin_lock_irqsave(&kprobe_lock, flags);
>>-	old_p = get_kprobe(p->addr);
>>+	if (!kernel_text_address((unsigned long)p->addr))
>>+		old_p = get_kprobe_user(p);
>>+	else
>>+		old_p = get_kprobe(p->addr);
>> 	if (old_p) {
>> 		if (old_p->pre_handler == aggr_pre_handler)
>> 			cleanup_aggr_kprobe(old_p, p, flags);
>>@@ -501,6 +551,236 @@ void unregister_jprobe(struct jprobe *jp
>> 	unregister_kprobe(&jp->kp);
>> }
>> 
>>+/*
>>+ * Wait for the page to be unlocked if someone else had locked it
>>+ * and map the page.
>>+ */
>>+static int map_uprobe_page(struct uprobe *up, int mapped)
The second parameter seems useless.



>>+{
>>+	/*nothing to to done, page is already mapped */
>>+	if (mapped)
>>+		return 0;
>>+	if (!up->page)
>>+		return 1;
>>+
>>+	wait_on_page_locked(up->page);
>>+	/* we could probably retry readpage here. */
I agree with above comment.



>>+	if (!PageUptodate(up->page))
>>+		return 1;
>>+	up->kp.addr = (kprobe_opcode_t *) kmap_atomic(up->page,
KM_USER0);
>>+	return 0;
>>+}
>>+
>>+/*
>>+ * Get the offset with the mapped page then register kprobe.
>>+ */
>>+static int insert_probe_page(struct uprobe *up)
>>+{
>>+	up->kp.addr = (kprobe_opcode_t *) ((unsigned long)up->kp.addr +
>>+				 (unsigned long)(up->offset &
~PAGE_MASK));
Does it guarantee later register_kprobe could distinguish it's a user
probe? Perhaps it's better to assign a fixed address in user space, such
like 0.


>>+	up->kp.opcode = 0;
It's not necessary to initiate kp.opcode as zero here.



>>+	return register_kprobe(&up->kp);
>>+
>>+}
>>+
>>+static void unmap_uprobe_page(struct uprobe *up)
>>+{
>>+	kunmap_atomic(up->kp.addr, KM_USER0);
>>+}
>>+
>>+/*
>>+ * find_get_vma walks through the list of process private mappings
and
>>+ * returns the pointer to vma containing the offset if found.
>>+ */
>>+static struct vm_area_struct *find_get_vma(unsigned long offset,
>>+					   struct address_space
*mapping)
>>+{
>>+	struct vm_area_struct *vma = NULL;
>>+	struct prio_tree_iter iter;
>>+	struct prio_tree_root *head = &mapping->i_mmap;
>>+	struct mm_struct *mm;
>>+	unsigned long start, end;
>>+
>>+	spin_lock(&mapping->i_mmap_lock);
>>+	vma_prio_tree_foreach(vma, &iter, head, offset, offset) {
>>+		mm = vma->vm_mm;
>>+		spin_lock(&mm->page_table_lock);
>>+		start = vma->vm_start - (vma->vm_pgoff << PAGE_SHIFT);
>>+		end = vma->vm_end - (vma->vm_pgoff << PAGE_SHIFT);
>>+		spin_unlock(&mm->page_table_lock);
>>+		if ((start + offset) < end) {
>>+			spin_unlock(&mapping->i_mmap_lock);
>>+			return vma;
>>+		}
>>+	}
>>+	spin_unlock(&mapping->i_mmap_lock);
>>+	return NULL;
>>+}
>>+
>>+/*
>>+ * Gets exclusive write access to the given inode to ensure that the
file
>>+ * on which probes are currently applied does not change. Use the
function,
>>+ * deny_write_access_to_inode() we added in fs/namei.c.
>>+ */
>>+static inline int ex_write_lock(struct inode *inode)
>>+{
>>+	return deny_write_access_to_inode(inode);
>>+}
>>+
>>+/*
>>+ * Called when removing user space probes to release the write lock
on the
>>+ * inode.
>>+ */
>>+static inline int ex_write_unlock(struct inode *inode)
>>+{
>>+	atomic_inc(&inode->i_writecount);
>>+	return 0;
>>+}
>>+
>>+/*
>>+ * Walk the uprobe_module_list and return the uprobe module with
matching
>>+ * inode.
>>+ */
>>+static struct uprobe_module *get_module_by_inode(struct inode *inode)
>>+{
>>+	struct uprobe_module *um;
>>+	list_for_each_entry(um, &uprobe_module_list, mlist) {
There is no any lock to protect uprobe_module_list. It's not safe under
multi-task environment.



>>+		if (um->inode == inode)
>>+			return um;
>>+	}
>>+	return NULL;
>>+}
>>+
>>+/*
>>+ * Walk the path and get the inode. Check for matching inode with the
module
>>+ * list.
>>+ */
>>+static struct uprobe_module *get_module_by_name(struct uprobe *p)
>>+{
>>+	struct nameidata nd;
>>+	struct inode *inode;
>>+	char *temp = p->name;
>>+
>>+	if (path_lookup(temp, LOOKUP_FOLLOW, &nd)) {
>>+		path_release(&nd);
>>+		printk("Failed to lookup the path\n");
>>+		return NULL;
>>+	}
>>+	inode = nd.dentry->d_inode;
>>+	path_release(&nd);
>>+	return get_module_by_inode(inode);
It's better to move path_release after calling get_module_by_inode in
case of potential race condition.


>>+}
>>+
>>+/*
>>+ * get inode operations.
>>+ *
>>+ * Walk the path name and get the inode. This function leaves with
the
>>+ * dentry held and taking with the inode writelock held to ensure
that the
>>+ * file on which probes are currently active does not change from
under us.
>>+ */
>>+static struct uprobe_module *get_inode_ops(struct uprobe *up)
>>+{
>>+	struct uprobe_module *um = NULL;
>>+	int error;
>>+	char *temp = up->name;
>>+
>>+	INIT_HLIST_NODE(&up->ulist);
>>+	um = get_module_by_name(up);
>>+	if (um) {
>>+		up->inode = um->inode;
>>+		hlist_add_head(&up->ulist, &um->ulist_head);
>>+		goto out;
>>+	}
>>+
>>+	um = kcalloc(1, sizeof(struct uprobe_module), GFP_KERNEL);
If kcalloc returns error, it should be checked here.


>>+	if (path_lookup(temp, LOOKUP_FOLLOW, &um->nd))
>>+		goto err;
>>+
>>+	up->inode = um->nd.dentry->d_inode;
>>+
>>+	if ((error = ex_write_lock(up->inode))) {
>>+		path_release(&um->nd);
>>+		goto err;
>>+	}
>>+	INIT_HLIST_HEAD(&um->ulist_head);
>>+	hlist_add_head(&up->ulist, &um->ulist_head);
>>+	list_add(&um->mlist, &uprobe_module_list);
>>+	um->inode = up->inode;
>>+	goto out;
>>+
>>+err:
>>+	kfree(um);
It looks like you forgot "return NULL" here.



>>+out:
>>+	return um;
>>+}
>>+/*
>>+ * physical insertion/removal of probes in the actual pages of the
module.
>>+ * Register user space probes before actually instering probes in the
page for
>>+ * a given pair of inode and offset.
>>+ */
>>+int register_userspace_probe(struct uprobe *up)
>>+{
>>+	struct address_space *mapping;
>>+	struct uprobe_module *um;
>>+	int error = 0;
>>+
>>+	if (!(um = get_inode_ops(up))) {
>>+		printk("get_inode_opertion: %s returned error %d\n",
up->name,
>>+		       error);
>>+		return -ENOSYS;
>>+	}
>>+
>>+	mapping = up->inode->i_mapping;
>>+	up->vma = find_get_vma(up->offset, mapping);
>>+	up->page = find_get_page(mapping, (up->offset >>
PAGE_CACHE_SHIFT));
>>+
>>+	if (!map_uprobe_page(up, 0)) {
If map_uprobe_page(up,0) != 0, register_userspace_probe still returns 0
meaning success register. It's incorrect and might confuse users.
How about change it to:
if (!(error=map_uprobe_page(up, 0))) {



>>+		error = insert_probe_page(up);
>>+		unmap_uprobe_page(up);
>>+	}
>>+
>>+	if (up->page)
>>+		page_cache_release(up->page);
>>+
>>+	return error;
>>+}
>>+
>>+/*
>>+ * physical insertion/removal of probes in the actual pages of the
module.
>>+ * Register user space probes before actually instering probes in the
page for
>>+ * a given pair of inode and offset.
>>+ *
>>+ */
>>+void unregister_userspace_probe(struct uprobe *up)
>>+{
>>+	kprobe_opcode_t *addr;
>>+	struct address_space *mapping = up->inode->i_mapping;
>>+	struct uprobe_module *um;
>>+
>>+	up->vma = find_get_vma(up->offset, mapping);
>>+	up->page = find_get_page(mapping, up->offset >>
PAGE_CACHE_SHIFT);
>>+
>>+	if (map_uprobe_page(up, 0)) {
>>+		printk("Probe unregister: failed\n");
>>+		goto out;
>>+	}
>>+	addr = (kprobe_opcode_t *) (up->kp.addr +
>>+			 (unsigned long)(up->offset & ~PAGE_MASK));
>>+	up->kp.addr = addr;
>>+	unregister_kprobe(&up->kp);
>>+	unmap_uprobe_page(up);
>>+	um = get_module_by_inode(up->inode);
>>+	hlist_del(&up->ulist);
>>+	if (hlist_empty(&um->ulist_head)) {
>>+		list_del(&um->mlist);
>>+		ex_write_unlock(up->inode);
>>+		path_release(&um->nd);	/* release path */
>>+	}
>>+out:
>>+	if (up->page)
>>+		page_cache_release(up->page);
>>+}
>>+
>> #ifdef ARCH_SUPPORTS_KRETPROBES
>> 
>> int register_kretprobe(struct kretprobe *rp)
>>@@ -574,6 +854,8 @@ static int __init init_kprobes(void)
>> 		INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
>> 	}
>> 
>>+	/* initialize uprobe_module_list */
>>+	INIT_LIST_HEAD(&uprobe_module_list);
>> 	err = arch_init_kprobes();
>> 	if (!err)
>> 		err = register_die_notifier(&kprobe_exceptions_nb);
>>@@ -590,4 +872,5 @@ EXPORT_SYMBOL_GPL(unregister_jprobe);
>> EXPORT_SYMBOL_GPL(jprobe_return);
>> EXPORT_SYMBOL_GPL(register_kretprobe);
>> EXPORT_SYMBOL_GPL(unregister_kretprobe);
>>-
>>+EXPORT_SYMBOL_GPL(register_userspace_probe);
>>+EXPORT_SYMBOL_GPL(unregister_userspace_probe);
>>diff -puN
include/linux/kprobes.h~kprobes_userspace_probes-newinterface
include/linux/kprobes.h
>>---
linux-2.6.13/include/linux/kprobes.h~kprobes_userspace_probes-newinterfa
ce	2005-09-14 11:00:20.025402504 +0530
>>+++ linux-2.6.13-prasanna/include/linux/kprobes.h	2005-09-14
11:00:20.047399160 +0530
>>@@ -33,7 +33,10 @@
>> #include <linux/list.h>
>> #include <linux/notifier.h>
>> #include <linux/smp.h>
>>-
>>+#include <linux/mm.h>
>>+#include <linux/dcache.h>
>>+#include <linux/namei.h>
>>+#include <linux/pagemap.h>
>> #include <asm/kprobes.h>
>> 
>> /* kprobe_status settings */
>>@@ -103,6 +106,24 @@ struct jprobe {
>> 	kprobe_opcode_t *entry;	/* probe handling code to jump to */
>> };
>> 
>>+
>>+struct uprobe {
>>+	char name[50];
>>+	struct kprobe kp;
>>+	struct hlist_node ulist; /*list of probe per module */
>>+	struct inode *inode;
>>+	unsigned long offset;
>>+	struct page *page;
>>+	struct vm_area_struct *vma;
>>+};
>>+
>>+struct uprobe_module {
>>+	struct hlist_head ulist_head;
>>+	struct list_head mlist;
>>+	struct inode *inode;
>>+	struct nameidata nd; /* to hold path/dentry etc. */
>>+};
>>+
>> #ifdef ARCH_SUPPORTS_KRETPROBES
>> extern void arch_prepare_kretprobe(struct kretprobe *rp, struct
pt_regs *regs);
>> #else /* ARCH_SUPPORTS_KRETPROBES */
>>@@ -159,6 +180,8 @@ extern int arch_init_kprobes(void);
>> extern void show_registers(struct pt_regs *regs);
>> extern kprobe_opcode_t *get_insn_slot(void);
>> extern void free_insn_slot(kprobe_opcode_t *slot);
>>+extern void arch_arm_uprobe(struct uprobe *up);
>>+extern void arch_disarm_uprobe(struct uprobe *up);
>> 
>> /* Get the kprobe at this addr (if any).  Must have called
lock_kprobes */
>> struct kprobe *get_kprobe(void *addr);
>>diff -puN
arch/i386/kernel/kprobes.c~kprobes_userspace_probes-newinterface
arch/i386/kernel/kprobes.c
>>---
linux-2.6.13/arch/i386/kernel/kprobes.c~kprobes_userspace_probes-newinte
rface	2005-09-14 11:00:20.029401896 +0530
>>+++ linux-2.6.13-prasanna/arch/i386/kernel/kprobes.c	2005-09-14
11:00:20.048399008 +0530
>>@@ -87,6 +87,20 @@ void arch_disarm_kprobe(struct kprobe *p
>> 			   (unsigned long) p->addr +
sizeof(kprobe_opcode_t));
>> }
>> 
>>+void arch_arm_uprobe(struct uprobe *up)
>>+{
>>+	*up->kp.addr = BREAKPOINT_INSTRUCTION;
>>+	flush_icache_user_range(up->vma, up->page,
>>+			(unsigned long) up->kp.addr,
sizeof(kprobe_opcode_t));
>>+}
>>+
>>+void arch_disarm_uprobe(struct uprobe *up)
>>+{
>>+	*up->kp.addr = up->kp.opcode;
>>+	flush_icache_user_range(up->vma, up->page,
>>+			(unsigned long) up->kp.addr,
sizeof(kprobe_opcode_t));
>>+}
>>+
>> void arch_remove_kprobe(struct kprobe *p)
>> {
>> }
>>diff -puN include/linux/namei.h~kprobes_userspace_probes-newinterface
include/linux/namei.h
>>---
linux-2.6.13/include/linux/namei.h~kprobes_userspace_probes-newinterface
2005-09-14 11:00:20.033401288 +0530
>>+++ linux-2.6.13-prasanna/include/linux/namei.h	2005-09-14
11:00:20.048399008 +0530
>>@@ -73,6 +73,7 @@ extern int follow_up(struct vfsmount **,
>> 
>> extern struct dentry *lock_rename(struct dentry *, struct dentry *);
>> extern void unlock_rename(struct dentry *, struct dentry *);
>>+extern int deny_write_access_to_inode(struct inode *inode);
>> 
>> static inline void nd_set_link(struct nameidata *nd, char *path)
>> {
>>diff -puN fs/namei.c~kprobes_userspace_probes-newinterface fs/namei.c
>>--- linux-2.6.13/fs/namei.c~kprobes_userspace_probes-newinterface
2005-09-14 11:00:20.037400680 +0530
>>+++ linux-2.6.13-prasanna/fs/namei.c	2005-09-14 11:00:20.052398400
+0530
>>@@ -286,6 +286,18 @@ int get_write_access(struct inode * inod
>> 	return 0;
>> }
>> 
>>+int deny_write_access_to_inode(struct inode * inode)
>>+{
>>+	spin_lock(&inode->i_lock);
>>+	if (atomic_read(&inode->i_writecount) > 0) {
>>+		spin_unlock(&inode->i_lock);
>>+		return -ETXTBSY;
>>+	}
>>+	atomic_dec(&inode->i_writecount);
>>+	spin_unlock(&inode->i_lock);
>>+	return 0;
>>+}
>>+
>> int deny_write_access(struct file * file)
>> {
>> 	struct inode *inode = file->f_dentry->d_inode;
>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2006-01-09  2:06 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-06  4:27 Review patches of user space kprobe Zhang, Yanmin
2006-01-06 12:28 ` Prasanna S Panchamukhi
  -- strict thread matches above, loose matches on Subject: below --
2006-01-09  2:06 Zhang, Yanmin
2006-01-09  2:04 Zhang, Yanmin
2006-01-09  1:48 Zhang, Yanmin
2006-01-06  9:12 Zhang, Yanmin
2006-01-06  9:28 ` Prasanna S Panchamukhi
2006-01-06  9:08 Zhang, Yanmin
2006-01-06 10:22 ` Prasanna S Panchamukhi
2006-01-06 10:30   ` Roland McGrath
2006-01-06  5:29 Zhang, Yanmin
2006-01-06  9:08 ` Prasanna S Panchamukhi
2006-01-06  5:22 Zhang, Yanmin
2006-01-06  9:04 ` Prasanna S Panchamukhi
2006-01-06  3:20 Zhang, Yanmin
2006-01-06  8:53 ` Prasanna S Panchamukhi
2006-01-06  2:52 Zhang, Yanmin
2006-01-06  6:53 ` Prasanna S Panchamukhi
2006-01-05  7:09 Zhang, Yanmin
2006-01-05 11:27 ` Prasanna S Panchamukhi
2005-12-22 13:24 Zhang, Yanmin
2006-01-05 11:10 ` Prasanna S Panchamukhi
2005-12-22  7:14 Zhang, Yanmin
2005-12-22  5:41 Zhang, Yanmin
2005-12-22  6:00 ` Vara Prasad
2006-01-05 11:06 ` Prasanna S Panchamukhi
2005-12-22  5:34 Zhang, Yanmin
2006-01-05 10:30 ` Prasanna S Panchamukhi
2005-12-22  5:09 Zhang, Yanmin
2006-01-05 10:29 ` Prasanna S Panchamukhi
2005-12-21  8:31 Zhang, Yanmin
2006-01-05 10:28 ` Prasanna S Panchamukhi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).