public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
@ 2011-12-21 22:38 fche at redhat dot com
  2011-12-21 22:59 ` [Bug uprobes/13539] " fche at redhat dot com
                   ` (13 more replies)
  0 siblings, 14 replies; 17+ messages in thread
From: fche at redhat dot com @ 2011-12-21 22:38 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

             Bug #: 13539
           Summary: occasional oops, kernel SEGV, RHEL5,
                    :uprobes:uprobe_free_process+0xba/0x131
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: uprobes
        AssignedTo: systemtap@sourceware.org
        ReportedBy: fche@redhat.com
    Classification: Unclassified


Similar to bug #10332, the following Oops can sometimes be seen:

stap_f851e90ddc08f1485de5ede118eb805a_2165: systemtap: 1.7/0.152, base:
ffffffff88744000, memory: 93data/24text/4ctx/2058net/3
3alloc kb, probes: 2, unpriv-uid: 0
stap_d28d88f749ad814bb6852f92fe4169d7_3448: systemtap: 1.7/0.152, base:
ffffffff88744000, memory: 130data/24text/4ctx/2058net/
33alloc kb, probes: 7, unpriv-uid: 0
Unable to handle kernel paging request at 0000000000200200 RIP: 
 [<ffffffff88702f09>] :uprobes:uprobe_free_process+0xba/0x131
PGD 27bb0067 PUD 22076067 PMD 0 
Oops: 0002 [1] SMP 
last sysfs file: /module/zlib_deflate/sections/__versions
CPU 0 
Modules linked in: stap_d28d88f749ad814bb6852f92fe4169d7_3448(U) uprobes(U)
stap_332da8e4c59a2a43e9adc53782bbd601_594(U) zlib_
deflate mtdcore virtio_pci virtio_ring virtio_balloon virtio xt_tcpudp
iptable_nat ip_nat ip_conntrack nfnetlink ip_tables x_t
ables netconsole hidp nfs nfs_acl rfcomm l2cap bluetooth lockd sunrpc be2iscsi
ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_cor
e ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi
cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transpor
t_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs
power_meter hwmon i2c_ec dell_wmi wmi button batte
ry asus_acpi acpi_memhotplug ac lp floppy tpm_tis 8139too 8139cp ide_cd
parport_pc tpm cdrom pcspkr mii i2c_piix4 parport tpm_
bios serio_raw i2c_core dm_raid45 dm_message dm_region_hash dm_mem_cache
dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix 
libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 15624, comm: loop Tainted: GF    ---- 2.6.18-300.el5 #1
RIP: 0010:[<ffffffff88702f09>]  [<ffffffff88702f09>]
:uprobes:uprobe_free_process+0xba/0x131
RSP: 0018:ffff81000e4e3e78  EFLAGS: 00010206
RAX: 0000000000100100 RBX: ffff81000dc11e00 RCX: ffff81000dc11f48
RDX: 0000000000200200 RSI: 00000000006e34a0 RDI: 0000000000000000
RBP: ffff81000dc11e00 R08: ffff81002d9a7000 R09: 0000000000000000
R10: 00000000419379e0 R11: 0000000000000246 R12: 00007fff2bc01000
R13: ffff81000dc11f78 R14: ffff81002b61d640 R15: ffff81000e4e3f30
FS:  0000000041937940(0063) GS:ffffffff8042f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000200200 CR3: 000000003a86d000 CR4: 00000000000006e0
Stack:  ffff81000dc11e00 ffff810034023680 0000000000000000 ffffffff88702fd6
 ffff81000dc11e00 ffffffff887049a1 0000000000000000 ffff81001ab87e80
 0000000000000000 ffff81001d812080 0000000000000003 ffffffff800c5088
Call Trace:
 [<ffffffff88702fd6>] :uprobes:uprobe_put_process+0x56/0xa0
 [<ffffffff887049a1>] :uprobes:uprobe_report_exit+0x141/0x14f
 [<ffffffff800c5088>] utrace_report_exit+0x57/0x17d
 [<ffffffff800156a8>] do_exit+0xdb/0x955
 [<ffffffff8009557b>] complete_and_exit+0x0/0x16
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 48 89 02 74 04 48 89 50 08 48 c7 85 48 01 00 00 00 01 10 00 
RIP  [<ffffffff88702f09>] :uprobes:uprobe_free_process+0xba/0x131
 RSP <ffff81000e4e3e78>

     eea:       e8 00 00 00 00          callq  eef <uprobe_free_process+0xa0>
                        eeb: R_X86_64_PC32      kfree+0xfffffffffffffffc
     eef:       48 8d 8d 48 01 00 00    lea    0x148(%rbp),%rcx
     ef6:       48 8b 51 08             mov    0x8(%rcx),%rdx
     efa:       48 85 d2                test   %rdx,%rdx
     efd:       74 26                   je     f25 <uprobe_free_process+0xd6>
     eff:       48 8b 85 48 01 00 00    mov    0x148(%rbp),%rax
     f06:       48 85 c0                test   %rax,%rax
>>>  f09:       48 89 02                mov    %rax,(%rdx)

This appears to be an inlined copy of __hlist_del:

static inline void __hlist_del(struct hlist_node *n)
{
        struct hlist_node *next = n->next;
     eff:       48 8b 85 48 01 00 00    mov    0x148(%rbp),%rax
        struct hlist_node **pprev = n->pprev;
        *pprev = next;
        if (next)
     f06:       48 85 c0                test   %rax,%rax
     f09:       48 89 02                mov    %rax,(%rdx)
     f0c:       74 04                   je     f12 <uprobe_free_process+0xc3>
                next->pprev = pprev;
     f0e:       48 89 50 08             mov    %rdx,0x8(%rax)
}

in particular the *pprev = next; statement.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
@ 2011-12-21 22:59 ` fche at redhat dot com
  2011-12-22  1:38 ` fche at redhat dot com
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2011-12-21 22:59 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

--- Comment #1 from Frank Ch. Eigler <fche at redhat dot com> 2011-12-21 22:38:13 UTC ---
Another version of the crash:

stap_909f4a49ec3ab955a92f98c10baf87c9_2099: systemtap: 1.7/0.152, base:
ffffffff88712000, memory: 93data/25text/4ctx/2058net/3
3alloc kb, probes: 2, unpriv-uid: 0
stap_ba22ca5697adb576d5fc2abd30243ca3_2219: systemtap: 1.7/0.152, base:
ffffffff88712000, memory: 93data/25text/4ctx/2058net/3
3alloc kb, probes: 2, unpriv-uid: 0
stap_daaddeacf81c54361489519b49af6c39_2257: systemtap: 1.7/0.152, base:
ffffffff88712000, memory: 93data/25text/4ctx/2058net/3
3alloc kb, probes: 2, unpriv-uid: 0
Unable to handle kernel paging request at 0000000000200200 RIP: 
 [<ffffffff887025cf>] :uprobes:uprobe_free_task+0x21/0xbb
PGD 256c6067 PUD 1dfc3067 PMD 0 
Oops: 0002 [1] SMP 
last sysfs file: /module/xt_tcpudp/sections/__versions
CPU 0 
Modules linked in: stap_daaddeacf81c54361489519b49af6c39_2257(U) uprobes(U)
virtio_pci virtio_ring virtio_balloon virtio xt_tc
pudp iptable_nat ip_nat ip_conntrack nfnetlink ip_tables x_tables netconsole
hidp nfs nfs_acl rfcomm l2cap bluetooth lockd sun
rpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp
bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb
3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2
scsi_transport_iscsi loop dm_multipath scsi_dh video back
light sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi
acpi_memhotplug ac lp floppy 8139too 8139cp mii i2c_p
iix4 i2c_core parport_pc parport ide_cd serio_raw cdrom tpm_tis tpm tpm_bios
pcspkr dm_raid45 dm_message dm_region_hash dm_mem
_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod
scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 4554, comm: loop Tainted: G     ---- 2.6.18-300.el5 #1
RIP: 0010:[<ffffffff887025cf>]  [<ffffffff887025cf>]
:uprobes:uprobe_free_task+0x21/0xbb
RSP: 0018:ffff810024ec1e88  EFLAGS: 00010006
RAX: 0000000000100100 RBX: ffff810018a5da00 RCX: 0000000000000000
RDX: 0000000000200200 RSI: 0000000000000296 RDI: ffffffff88708880
RBP: ffff810031ad6580 R08: ffff810024ec0000 R09: 00000000aab470c6
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: ffff810018a5db78 R14: ffff810020bbb580 R15: ffff810024ec1f30
FS:  0000000041c3d940(0063) GS:ffffffff8042f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000200200 CR3: 000000002438a000 CR4: 00000000000006e0
Process loop (pid: 4554, threadinfo ffff810024ec0000, task ffff81003e38b7a0)
Stack:  ffff810018a5da00 ffff810031ad6580 0000000000000000 ffffffff88704940
 0000000000000000 ffff810019e6dd40 0000000000000000 ffff81003e38b7a0
 0000000000000003 ffffffff800c5088 ffff8100317ab800 ffff810020bbb590
Call Trace:
 [<ffffffff88704940>] :uprobes:uprobe_report_exit+0xe0/0x14f
 [<ffffffff800c5088>] utrace_report_exit+0x57/0x17d
 [<ffffffff800156a8>] do_exit+0xdb/0x955
 [<ffffffff8009557b>] complete_and_exit+0x0/0x16
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 48 89 02 74 04 48 89 50 08 48 c7 c7 80 88 70 88 48 c7 45 00 
RIP  [<ffffffff887025cf>] :uprobes:uprobe_free_task+0x21/0xbb
 RSP <ffff810024ec1e88>

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
  2011-12-21 22:59 ` [Bug uprobes/13539] " fche at redhat dot com
@ 2011-12-22  1:38 ` fche at redhat dot com
  2011-12-22 12:20 ` fche at redhat dot com
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2011-12-22  1:38 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

--- Comment #2 from Frank Ch. Eigler <fche at redhat dot com> 2011-12-21 22:58:56 UTC ---
Extra data points.  These crashes are LIST_POISON* related.  The test
case being run appears to be the new unprivileged_myproc.exp, which
runs a multithreaded .../loop.c program.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
  2011-12-21 22:59 ` [Bug uprobes/13539] " fche at redhat dot com
  2011-12-22  1:38 ` fche at redhat dot com
@ 2011-12-22 12:20 ` fche at redhat dot com
  2011-12-31 14:27 ` fche at redhat dot com
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2011-12-22 12:20 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

--- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> 2011-12-22 01:37:34 UTC ---
Same thing happens on i686, in same place (__hlist_del):

static void uprobe_free_process(struct uprobe_process *uproc)                   
{
  ... 
  if (!hlist_unhashed(&uproc->hlist))
    hlist_del(&uproc->hlist);
}

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (2 preceding siblings ...)
  2011-12-22 12:20 ` fche at redhat dot com
@ 2011-12-31 14:27 ` fche at redhat dot com
  2012-01-02 22:31 ` fche at redhat dot com
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2011-12-31 14:27 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P1
           Severity|normal                      |critical

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (3 preceding siblings ...)
  2011-12-31 14:27 ` fche at redhat dot com
@ 2012-01-02 22:31 ` fche at redhat dot com
  2012-01-06 12:25   ` Srikar Dronamraju
  2012-01-06 16:57 ` dsmith at redhat dot com
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 17+ messages in thread
From: fche at redhat dot com @ 2012-01-02 22:31 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2
           Severity|critical                    |normal

--- Comment #4 from Frank Ch. Eigler <fche at redhat dot com> 2012-01-02 22:30:45 UTC ---
Some code from Jim Keniston, later adapted by yours truly, makes some
progress on the race conditions present in runtime/uprobes{,2}.  It
seems like something more dramatic will be required.

I pushed my current working changes to the new branch pr13539.  Test
it with # make installcheck RUNTESTFLAGS=unprivileged_myproc.exp on
an SMP (virtual?) machine.  I can reproduce some problem or another
on a rhel5 2.6.18-300{,debug}.{i686,x86_64} box easily, and less
easily on rhel6/fedoras.  (On the latter, run it in a loop.)

The original race condition was that the "./loop 1" program's threads
killed themselves right around the same time as the stap module 
decides to unregister probes due to the probe handler's { exit() }.
The effect is that the suicide uprobe_report_* callbacks race with
uprobe_free_{task,process} coming in from uprobe_put_process.  One
of them ends up deallocating the uprobe_proc struct, the other ends
up trying to take a semaphore, or muck with a hlist node, in the
resulting freed block.

The current status of the pr13539 branch works around some of the
various possible races, but now gets stuck in the post-exit (?)
utrace-quiesce loop of the target "./loop 1" process:

[1480353.094558] stap_3960d10ec2d1cdbbd5924a89713e08c4_2157: systemtap:
1.7/0.152, base: ffffffff88740000, memory: 94data/25text/2ctx/2058net/34alloc
kb, probes: 2, unpriv-uid: 0
[1480353.107405] uprobe_report_clone ffff81003c627138 14025=14025
[1480353.118383] uprobe_report_clone2 ffff81003c627138 14025=14025
[1480353.122169] uprobe_report_exit ffff81003c627138 14025=14028
[1480353.125266] uprobe_report_quiesce ffff81003c627138 14025=14025
[1480353.128373] uprobe_report_quiesce2 ffff81003c627138 14025=14025
[1480353.130829] uprobe_report_quiesce3 ffff81003c627138 14025=14025
[1480353.133212] uprobe_report_exit1a ffff81003c627138 14025=14028
[1480353.135620] uprobe_report_exit2 ffff81003c627138 14025=14028
[1480353.138275] uprobe_free_task ffff81000f69aa48 (tid 14028), caller
ffffffff88718bfcS, ctid 14028
[1480353.142031] uprobe_report_exit3 ffff81003c627138 14025=14028
[1480353.144330] uprobe_report_exit4 ffff81003c627138 14025=14028
[1480353.157461] uprobe_free_process ffff81003c627138 (pid 14025), caller
ffffffff88717048S, ctid 14028
[1480353.161439] uprobe_free_task ffff81000f69a5e8 (tid 14025), caller
ffffffff88716fb2S, ctid 14028
[1480353.165132] uprobe_free_process zap ffff81003c627138

[sysrq-t sez: ...]

[1486034.240725] stap          X ffff8100131a9588     0 13933  12033           
         (L-TLB)
[1486034.245729]  ffff810010a0df08 0000000000000046 ffff810019064d60
0000000000000246
[1486034.263375]  ffff810013733e70 0000000000000009 ffff810019064700
ffffffff8032ed40
[1486034.269112]  0005425fab97e401 00000000007b7869 ffff8100190648e8
0000000013733e60
[1486034.273530] Call Trace:
[1486034.276317]  [<ffffffff800ce099>] check_dead_utrace+0x11c/0x185
[1486034.278710]  [<ffffffff80016bc3>] do_exit+0x96c/0x978
[1486034.280831]  [<ffffffff8004b58f>] debug_mutex_init+0x0/0x3b
[1486034.283114]  [<ffffffff800602a6>] tracesys+0xd5/0xdf
[1486034.285251] 

[1486034.286547] loop          t ffff8100218fb148     0 14025      1           
    3882 (NOTLB)
[1486034.301736]  ffff810010a17cf8 0000000000000046 0000000000000246
ffffffff802a3700
[1486034.306797]  ffffffff8871d0a0 0000000000000007 ffff81000fbe25c0
ffff810014d54340
[1486034.311027]  0005425f92a59fff 0000000000796f2c ffff81000fbe27a8
0000000200000001
[1486034.314436] Call Trace:
[1486034.317175]  [<ffffffff800ce543>] utrace_quiescent+0xe6/0x26d
[1486034.320411]  [<ffffffff800cf503>] utrace_get_signal+0x4f8/0x55b
[1486034.323536]  [<ffffffff8002c804>] get_signal_to_deliver+0x5a/0x4b9
[1486034.326940]  [<ffffffff8002c930>] get_signal_to_deliver+0x186/0x4b9
[1486034.339013]  [<ffffffff8005d427>] do_notify_resume+0x9c/0x7b0
[1486034.341901]  [<ffffffff800936cf>] default_wake_function+0x0/0xe
[1486034.350764]  [<ffffffff80032cd6>] do_fork+0x148/0x1c1
[1486034.353004]  [<ffffffff80067fb8>] trace_hardirqs_off_thunk+0x35/0x67
[1486034.355512]  [<ffffffff8006035f>] int_signal+0x12/0x17

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2012-01-02 22:31 ` fche at redhat dot com
@ 2012-01-06 12:25   ` Srikar Dronamraju
  2012-01-06 17:01     ` David Smith
  0 siblings, 1 reply; 17+ messages in thread
From: Srikar Dronamraju @ 2012-01-06 12:25 UTC (permalink / raw)
  To: Frank Ch Eigler; +Cc: systemtap


While I still cannot see a reason how uprobe_{free,put}_process can 
race uprobe_report_{exit,exec}, I certainly think somebit of cleanup 
can be done.  However I am dont think we need to do a utask or uproc lookup
from the table. Especially in case of callbacks.

Mostly similar to what Jim proposed.
I havent tested this patch myself and I couldnt reproduce the problem.

---

 runtime/uprobes/uprobes.c  |  106 ++++++++++++++++++++++++++++++++-----------
 runtime/uprobes2/uprobes.c |  105 ++++++++++++++++++++++++++++---------------
 2 files changed, 148 insertions(+), 63 deletions(-)

diff --git a/runtime/uprobes/uprobes.c b/runtime/uprobes/uprobes.c
index 47fb1c9..ed1e94b 100644
--- a/runtime/uprobes/uprobes.c
+++ b/runtime/uprobes/uprobes.c
@@ -114,33 +114,39 @@ struct delayed_signal {
 	siginfo_t info;
 };
 
-static struct uprobe_task *uprobe_find_utask(struct task_struct *tsk)
+static struct uprobe_task *uprobe_find_utask_locked(struct task_struct *tsk)
 {
 	struct hlist_head *head;
 	struct hlist_node *node;
 	struct uprobe_task *utask;
-	unsigned long flags;
 
 	head = &utask_table[hash_ptr(tsk, UPROBE_HASH_BITS)];
-	lock_utask_table(flags);
 	hlist_for_each_entry(utask, node, head, hlist) {
-		if (utask->tsk == tsk) {
-			unlock_utask_table(flags);
+		if (utask->tsk == tsk)
 			return utask;
-		}
 	}
-	unlock_utask_table(flags);
 	return NULL;
 }
 
+static struct uprobe_task *uprobe_find_utask(struct task_struct *tsk)
+{
+        struct uprobe_task *utask;
+        unsigned long flags;
+
+        lock_utask_table(flags);
+        utask = uprobe_find_utask_locked(tsk);
+        unlock_utask_table(flags);
+        return utask;
+}
+
 static void uprobe_hash_utask(struct uprobe_task *utask)
 {
 	struct hlist_head *head;
 	unsigned long flags;
 
 	INIT_HLIST_NODE(&utask->hlist);
-	head = &utask_table[hash_ptr(utask->tsk, UPROBE_HASH_BITS)];
 	lock_utask_table(flags);
+	head = &utask_table[hash_ptr(utask->tsk, UPROBE_HASH_BITS)];
 	hlist_add_head(&utask->hlist, head);
 	unlock_utask_table(flags);
 }
@@ -154,9 +160,11 @@ static void uprobe_unhash_utask(struct uprobe_task *utask)
 	unlock_utask_table(flags);
 }
 
-static inline void uprobe_get_process(struct uprobe_process *uproc)
+static struct uprobe_process * uprobe_get_process(struct uprobe_process *uproc)
 {
-	atomic_inc(&uproc->refcount);
+	if (atomic_inc_not_zero(&uproc->refcount))
+		return uproc;
+	return NULL;
 }
 
 /*
@@ -186,8 +194,9 @@ static struct uprobe_process *uprobe_find_process(pid_t tgid)
 	head = &uproc_table[hash_long(tgid, UPROBE_HASH_BITS)];
 	hlist_for_each_entry(uproc, node, head, hlist) {
 		if (uproc->tgid == tgid && !uproc->finished) {
-			uprobe_get_process(uproc);
-			down_write(&uproc->rwsem);
+			uproc = uprobe_get_process(uproc);
+			if (uproc)
+				down_write(&uproc->rwsem);
 			return uproc;
 		}
 	}
@@ -476,6 +485,12 @@ static void uprobe_free_task(struct uprobe_task *utask)
 	struct deferred_registration *dr, *d;
 	struct delayed_signal *ds, *ds2;
 
+        printk(KERN_INFO "uprobe_free_task %p (tid %ld), caller %pS, ctid %ld\n", utask, utask->tsk->pid, _RET_IP_, current->pid);
+
+        /*
+         * Do this first, since a utask that's still in the utask_table
+         * is assumed (e.g., by uprobe_report_exit) to be valid.
+         */
 	uprobe_unhash_utask(utask);
 	list_del(&utask->list);
 	list_for_each_entry_safe(dr, d, &utask->deferred_registrations, list) {
@@ -487,7 +502,6 @@ static void uprobe_free_task(struct uprobe_task *utask)
 		list_del(&ds->list);
 		kfree(ds);
 	}
-
 	utask_free_uretprobe_instances(utask);
 
 	kfree(utask);
@@ -499,12 +513,13 @@ static void uprobe_free_process(struct uprobe_process *uproc)
 	struct uprobe_task *utask, *tmp;
 	struct uprobe_ssol_area *area = &uproc->ssol_area;
 
+        printk(KERN_INFO "uprobe_free_process %p (pid %ld), caller %pS, ctid %ld\n", uproc, uproc->tgid, _RET_IP_, current->pid);
+
 	if (!uproc->finished)
 		uprobe_release_ssol_vma(uproc);
 	if (area->slots)
 		kfree(area->slots);
-	if (!hlist_unhashed(&uproc->hlist))
-		hlist_del(&uproc->hlist);
+	hlist_del(&uproc->hlist);
 	list_for_each_entry_safe(utask, tmp, &uproc->thread_list, list) {
 		/*
 		 * utrace_detach() is OK here (required, it seems) even if
@@ -515,6 +530,7 @@ static void uprobe_free_process(struct uprobe_process *uproc)
 		uprobe_free_task(utask);
 	}
 	up_write(&uproc->rwsem);	// So kfree doesn't complain
+        printk(KERN_INFO "uprobe_free_process zap %p\n", uproc);
 	kfree(uproc);
 }
 
@@ -539,7 +555,9 @@ static int uprobe_put_process(struct uprobe_process *uproc)
 			/*
 			 * The works because uproc_mutex is held any
 			 * time the ref count can go from 0 to 1 -- e.g.,
-			 * register_uprobe() sneaks in with a new probe.
+			 * register_uprobe() snuck in with a new probe,
+                         * or a callback such as uprobe_report_exit()
+                         * just started.
 			 */
 			up_write(&uproc->rwsem);
 		} else {
@@ -852,7 +870,7 @@ static int uprobe_validate_vma(struct task_struct *t, unsigned long vaddr)
 	mmput(mm);
 	return ret;
 }
-	
+
 /* Probed address must be in an executable VM area, outside the SSOL area. */
 static int uprobe_validate_vaddr(struct task_struct *p, unsigned long vaddr,
 	struct uprobe_process *uproc)
@@ -1961,9 +1979,15 @@ static u32 uprobe_report_quiesce(struct utrace_attached_engine *engine,
 	struct uprobe_task *utask;
 	struct uprobe_process *uproc;
 
+	rcu_read_lock();
 	utask = (struct uprobe_task *)rcu_dereference(engine->data);
 	BUG_ON(!utask);
-	uproc = utask->uproc;
+	uproc = uprobe_get_process(utask->uproc);
+	rcu_read_unlock();
+
+	if (!uproc)
+		return UTRACE_ACTION_DETACH|UTRACE_ACTION_RESUME;
+
 	if (current == utask->quiesce_master) {
 		/*
 		 * tsk was already quiescent when quiesce_all_threads()
@@ -1977,8 +2001,11 @@ static u32 uprobe_report_quiesce(struct utrace_attached_engine *engine,
 	}
 
 	BUG_ON(utask->active_probe);
+
 	down_write(&uproc->rwsem);
 
+        printk(KERN_INFO "uprobe_report_quiesce2 %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
+
 	/*
 	 * When a thread hits a breakpoint or single-steps, utrace calls
 	 * this quiesce callback before our signal callback.  We must
@@ -1998,6 +2025,8 @@ static u32 uprobe_report_quiesce(struct utrace_attached_engine *engine,
 	check_uproc_quiesced(uproc, tsk);
 done:
 	up_write(&uproc->rwsem);
+	uprobe_put_process(uproc);
+        printk(KERN_INFO "uprobe_report_quiesce3 %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
 	return UTRACE_ACTION_RESUME;
 }
 
@@ -2074,8 +2103,14 @@ static u32 uprobe_report_exit(struct utrace_attached_engine *engine,
 	int utask_quiescing;
 
 	utask = (struct uprobe_task *)rcu_dereference(engine->data);
-	uproc = utask->uproc;
-	uprobe_get_process(uproc);
+	if (utask)
+		uproc = uprobe_get_process(utask->uproc);
+
+	if (!utask || !uproc)
+		/* uprobe_free_process() has probably clobbered utask->proc. */
+		return UTRACE_ACTION_DETACH;
+
+        printk(KERN_INFO "uprobe_report_exit %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
 
 	ppt = utask->active_probe;
 	if (ppt) {
@@ -2108,6 +2143,9 @@ static u32 uprobe_report_exit(struct utrace_attached_engine *engine,
 	}
 
 	down_write(&uproc->rwsem);
+
+        printk(KERN_INFO "uprobe_report_exit2 %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
+
 	utask_quiescing = utask->quiescing;
 	uprobe_free_task(utask);
 
@@ -2129,6 +2167,8 @@ static u32 uprobe_report_exit(struct utrace_attached_engine *engine,
 		uprobe_cleanup_process(uproc);
 	}
 	up_write(&uproc->rwsem);
+        printk(KERN_INFO "uprobe_report_exit3 %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
+
 	uprobe_put_process(uproc);
 
 	return UTRACE_ACTION_DETACH;
@@ -2241,7 +2281,7 @@ static int uprobe_fork_uproc(struct uprobe_process *parent_uproc,
 	child_utask = uprobe_find_utask(child_tsk);
 	BUG_ON(!child_utask);
 	ret = uprobe_fork_uretprobe_instances(parent_utask, child_utask);
-	
+
 	hlist_add_head(&child_uproc->hlist,
 			&uproc_table[hash_long(child_uproc->tgid,
 			UPROBE_HASH_BITS)]);
@@ -2271,6 +2311,7 @@ static u32 uprobe_report_clone(struct utrace_attached_engine *engine,
 	ptask = (struct uprobe_task *)rcu_dereference(engine->data);
 	uproc = ptask->uproc;
 
+        printk(KERN_INFO "uprobe_report_clone %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
 	/*
 	 * Lock uproc so no new uprobes can be installed 'til all
 	 * report_clone activities are completed.  Lock uproc_table
@@ -2280,6 +2321,8 @@ static u32 uprobe_report_clone(struct utrace_attached_engine *engine,
 	down_write(&uproc->rwsem);
 	get_task_struct(child);
 
+        printk(KERN_INFO "uprobe_report_clone2 %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
+
 	if (clone_flags & (CLONE_THREAD|CLONE_VM)) {
 		/* New thread in the same process (CLONE_THREAD) or
 		 * processes sharing the same memory space (CLONE_VM). */
@@ -2323,7 +2366,7 @@ static u32 uprobe_report_clone(struct utrace_attached_engine *engine,
 				}
 			}
 		}
-		
+
 		if (!hlist_empty(&ptask->uretprobe_instances))
 			(void) uprobe_fork_uproc(uproc, ptask, child);
 	}
@@ -2360,8 +2403,14 @@ static u32 uprobe_report_exec(struct utrace_attached_engine *engine,
 	u32 ret = UTRACE_ACTION_RESUME;
 
 	utask = (struct uprobe_task *)rcu_dereference(engine->data);
-	uproc = utask->uproc;
-	uprobe_get_process(uproc);
+	if (utask)
+		uproc = uprobe_get_process(utask->uproc);
+
+	if (!utask || !uproc)
+		/* uprobe_free_process() has probably clobbered utask->proc. */
+		return UTRACE_ACTION_DETACH;
+
+        printk(KERN_INFO "uprobe_report_exec %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
 
 	/*
 	 * Only cleanup if we're the last thread.  If we aren't,
@@ -2390,11 +2439,14 @@ static u32 uprobe_report_exec(struct utrace_attached_engine *engine,
                 ret = UTRACE_ACTION_DETACH;
 	}
 	up_write(&uproc->rwsem);
+        printk(KERN_INFO "uprobe_report_exec2 %p %ld=%ld\n", uproc, uproc->tgid, current->pid);
 
 	/* If any [un]register_uprobe is pending, it'll clean up. */
 	if (uprobe_put_process(uproc))
 		ret = UTRACE_ACTION_DETACH;
 
+        printk(KERN_INFO "uprobe_report_exec4 %p %ld=%ld ret=%lu\n", uproc, uproc->tgid, current->pid, (unsigned long)ret);
+
 	return ret;
 }
 
@@ -2631,7 +2683,7 @@ static inline unsigned long lookup_uretprobe(struct hlist_node *r,
 {
 	struct uretprobe_instance *ret_inst;
 	unsigned long trampoline_addr;
-	
+
 	if (IS_ERR(uproc->uretprobe_trampoline_addr))
 	  return pc;
 	trampoline_addr = (unsigned long)uproc->uretprobe_trampoline_addr;
@@ -2672,7 +2724,7 @@ unsigned long uprobe_get_pc(struct uretprobe_instance *ri, unsigned long pc,
 		if (!uk)
 			return 0;
 		uproc = uk->ppt->uproc;
-		r = &ri->hlist;		
+		r = &ri->hlist;
 	}
 	return lookup_uretprobe(r, uproc, pc, sp);
 }
@@ -2685,7 +2737,7 @@ unsigned long uprobe_get_pc_task(struct task_struct *task, unsigned long pc,
 	struct uprobe_task *utask;
 	struct uprobe_process *uproc;
 	unsigned long result;
-		
+
 	utask = uprobe_find_utask(task);
 	if (!utask) {
 		return pc;
diff --git a/runtime/uprobes2/uprobes.c b/runtime/uprobes2/uprobes.c
index b3c6e57..953ddf3 100644
--- a/runtime/uprobes2/uprobes.c
+++ b/runtime/uprobes2/uprobes.c
@@ -114,33 +114,39 @@ struct delayed_signal {
 	siginfo_t info;
 };
 
-static struct uprobe_task *uprobe_find_utask(struct task_struct *tsk)
+static struct uprobe_task *uprobe_find_utask_locked(struct task_struct *tsk)
 {
 	struct hlist_head *head;
 	struct hlist_node *node;
 	struct uprobe_task *utask;
-	unsigned long flags;
 
 	head = &utask_table[hash_ptr(tsk, UPROBE_HASH_BITS)];
-	lock_utask_table(flags);
 	hlist_for_each_entry(utask, node, head, hlist) {
-		if (utask->tsk == tsk) {
-			unlock_utask_table(flags);
+		if (utask->tsk == tsk)
 			return utask;
-		}
 	}
-	unlock_utask_table(flags);
 	return NULL;
 }
 
+static struct uprobe_task *uprobe_find_utask(struct task_struct *tsk)
+{
+	struct uprobe_task *utask;
+	unsigned long flags;
+
+	lock_utask_table(flags);
+	utask = uprobe_find_utask_locked(tsk);
+	unlock_utask_table(flags);
+	return utask;
+}
+
 static void uprobe_hash_utask(struct uprobe_task *utask)
 {
 	struct hlist_head *head;
 	unsigned long flags;
 
 	INIT_HLIST_NODE(&utask->hlist);
-	head = &utask_table[hash_ptr(utask->tsk, UPROBE_HASH_BITS)];
 	lock_utask_table(flags);
+	head = &utask_table[hash_ptr(utask->tsk, UPROBE_HASH_BITS)];
 	hlist_add_head(&utask->hlist, head);
 	unlock_utask_table(flags);
 }
@@ -154,9 +160,11 @@ static void uprobe_unhash_utask(struct uprobe_task *utask)
 	unlock_utask_table(flags);
 }
 
-static inline void uprobe_get_process(struct uprobe_process *uproc)
+static inline uprobe_process * uprobe_get_process(struct uprobe_process *uproc)
 {
-	atomic_inc(&uproc->refcount);
+	if (atomic_inc_not_zero(&uproc->refcount))
+		return uproc;
+	return NULL;
 }
 
 /*
@@ -186,8 +194,9 @@ static struct uprobe_process *uprobe_find_process(struct pid *tg_leader)
 	head = &uproc_table[hash_ptr(tg_leader, UPROBE_HASH_BITS)];
 	hlist_for_each_entry(uproc, node, head, hlist) {
 		if (uproc->tg_leader == tg_leader && !uproc->finished) {
-			uprobe_get_process(uproc);
-			down_write(&uproc->rwsem);
+			uproc = uprobe_get_process(uproc);
+			if (uproc)
+				down_write(&uproc->rwsem);
 			return uproc;
 		}
 	}
@@ -547,24 +556,32 @@ static void uprobe_free_task(struct uprobe_task *utask, bool in_callback)
 	struct deferred_registration *dr, *d;
 	struct delayed_signal *ds, *ds2;
 
-	if (utask->engine && (utask->tsk != current || !in_callback)) {
-		int result;
+	/*
+	 * Do this first, since a utask that's still in the utask_table
+	 * is assumed (e.g., by uprobe_report_exit) to be valid.
+	 */
+	uprobe_unhash_utask(utask);
 
+	if (utask->engine && (utask->tsk != current || !in_callback)) {
 		/*
-		 * No other tasks in this process should be running
-		 * uprobe_report_* callbacks.  (If they are, utrace_barrier()
-		 * here could deadlock.)
+		 * If we're racing with (say) uprobe_report_exit() here,
+		 * utrace_control_pid() may fail with -EINPROGRESS.  That's
+		 * OK.  The callback will abort with UTRACE_DETACH after
+		 * we're done.  It is NOT OK to call utrace_barrier() here,
+		 * since the callback would probably deadlock awaiting
+		 * uproc->rwsem.
+		 *
 		 * utrace_control_pid calls task_pid() so we should hold the
 		 * rcu_read_lock.  */
 		rcu_read_lock();
-		result = utrace_control_pid(utask->pid, utask->engine,
-					    UTRACE_DETACH);
+		if (utrace_control_pid(utask->pid, utask->engine,
+						UTRACE_DETACH) != 0)
+			/* Ignore it. */
+			;
 		rcu_read_unlock();
-		BUG_ON(result == -EINPROGRESS);
 	}
 	put_pid(utask->pid);	/* null pid OK */
 
-	uprobe_unhash_utask(utask);
 	list_del(&utask->list);
 	list_for_each_entry_safe(dr, d, &utask->deferred_registrations, list) {
 		list_del(&dr->list);
@@ -594,8 +611,7 @@ static void uprobe_free_process(struct uprobe_process *uproc, int in_callback)
 
 	if (area->slots)
 		kfree(area->slots);
-	if (!hlist_unhashed(&uproc->hlist))
-		hlist_del(&uproc->hlist);
+	hlist_del(&uproc->hlist);
 	list_for_each_entry_safe(utask, tmp, &uproc->thread_list, list)
 		uprobe_free_task(utask, in_callback);
 	put_pid(uproc->tg_leader);
@@ -622,9 +638,9 @@ static int uprobe_put_process(struct uprobe_process *uproc, bool in_callback)
 		down_write(&uproc->rwsem);
 		if (unlikely(atomic_read(&uproc->refcount) != 0)) {
 			/*
-			 * The works because uproc_mutex is held any
-			 * time the ref count can go from 0 to 1 -- e.g.,
-			 * register_uprobe() sneaks in with a new probe.
+			 * register_uprobe() snuck in with a new probe,
+			 * or a callback such as uprobe_report_exit()
+			 * just started.
 			 */
 			up_write(&uproc->rwsem);
 		} else {
@@ -1771,9 +1787,10 @@ static int utask_fake_quiesce(struct uprobe_task *utask)
 	} else {
 		utask->state = UPTASK_SLEEPING;
 		uproc->n_quiescent_threads++;
-		up_write(&uproc->rwsem);
+
 		/* We ref-count sleepers. */
 		uprobe_get_process(uproc);
+		up_write(&uproc->rwsem);
 
 		wait_event(uproc->waitq, !utask->quiescing);
 
@@ -1921,9 +1938,15 @@ static u32 uprobe_report_signal(u32 action,
 	rcu_read_lock();
 	utask = (struct uprobe_task *)rcu_dereference(engine->data);
 	BUG_ON(!utask);
-	uproc = utask->uproc;
+	/* Keep uproc intact until just before we return. */
+	uproc = uprobe_get_process(utask->uproc);
+
 	rcu_read_unlock();
 
+	if (!uproc)
+		/* uprobe_free_process() has probably clobbered utask->proc. */
+		return UTRACE_SIGNAL_IGN | UTRACE_DETACH;
+
 	/*
 	 * We may need to re-assert UTRACE_SINGLESTEP if this signal
 	 * is not associated with the breakpoint.
@@ -1933,9 +1956,6 @@ static u32 uprobe_report_signal(u32 action,
 	else
 		resume_action = UTRACE_RESUME;
 
-	/* Keep uproc intact until just before we return. */
-	uprobe_get_process(uproc);
-
 	if (unlikely(signal_action == UTRACE_SIGNAL_REPORT)) {
 		/* This thread was quiesced using UTRACE_INTERRUPT. */
 		bool done_quiescing;
@@ -2188,7 +2208,7 @@ static u32 uprobe_report_quiesce(
 		return UTRACE_SINGLESTEP;
 
 	BUG_ON(utask->active_probe);
-	uproc = utask->uproc;
+	uproc = uprobe_get_process(utask->uproc);
 	down_write(&uproc->rwsem);
 #if 0
 	// TODO: Is this a concern any more?
@@ -2211,6 +2231,7 @@ static u32 uprobe_report_quiesce(
 	done_quiescing = utask_quiesce(utask);
 // done:
 	up_write(&uproc->rwsem);
+	uprobe_put_process(utask->uproc);
 	return (done_quiescing ? UTRACE_RESUME : UTRACE_STOP);
 }
 
@@ -2295,9 +2316,15 @@ static u32 uprobe_report_exit(enum utrace_resume_action action,
 
 	rcu_read_lock();
 	utask = (struct uprobe_task *)rcu_dereference(engine->data);
-	uproc = utask->uproc;
+	BUG_ON(!utask);
+	/* Keep uproc intact until just before we return. */
+	uproc = uprobe_get_process(utask->uproc);
+
 	rcu_read_unlock();
-	uprobe_get_process(uproc);
+
+	if (!uproc)
+		/* uprobe_free_process() has probably clobbered utask->proc. */
+		return UTRACE_DETACH;
 
 	ppt = utask->active_probe;
 	if (ppt) {
@@ -2626,9 +2653,15 @@ static u32 uprobe_report_exec(
 
 	rcu_read_lock();
 	utask = (struct uprobe_task *)rcu_dereference(engine->data);
-	uproc = utask->uproc;
+	BUG_ON(!utask);
+	/* Keep uproc intact until just before we return. */
+	uproc = uprobe_get_process(utask->uproc);
+
 	rcu_read_unlock();
-	uprobe_get_process(uproc);
+
+	if (!uproc)
+		/* uprobe_free_process() has probably clobbered utask->proc. */
+		return UTRACE_DETACH;
 
 	/*
 	 * Only cleanup if we're the last thread.  If we aren't,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (4 preceding siblings ...)
  2012-01-02 22:31 ` fche at redhat dot com
@ 2012-01-06 16:57 ` dsmith at redhat dot com
  2012-01-06 17:01 ` dsmith at redhat dot com
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2012-01-06 16:57 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

David Smith <dsmith at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dsmith at redhat dot com

--- Comment #5 from David Smith <dsmith at redhat dot com> 2012-01-06 16:56:25 UTC ---
I've reproduced this oops.  Here's how.  I wasn't able to reproduce this in a
vm.  Here are the details of the hardware/software I've duplicated this on.

x86_64
2.6.18-301.el5debug
4 cpus
1Gb memory

Note that I couldn't duplicate the oops with the regular kernel, only the debug
kernel.

with the code on the pr13539 branch, I'm seeing a hang in stapio:

    Jan 5 15:48:27 hp-dl140g2-01 kernel:
stap_29f5d7d6b50882b60786646febccf5da_2100: systemtap: 1.7/0.152, base:
ffffffff8862e000, memory: 94data/25text/2ctx/2058net/34alloc kb, probes: 2,
unpriv-uid: 0
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_clone ffff810023d7e118
4525=4525
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_clone2 ffff810023d7e118
4525=4525
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit ffff810023d7e118
4525=4528
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit1a ffff810023d7e118
4525=4528
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit2 ffff810023d7e118
4525=4528
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_free_task ffff81002c1604d0 (tid
4528), caller ffffffff8861fbfcS, ctid 4528
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit3 ffff810023d7e118
4525=4528
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit4 ffff810023d7e118
4525=4528
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_quiesce ffff810023d7e118
4525=4525
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_quiesce2
ffff810023d7e118 4525=4525
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_quiesce3
ffff810023d7e118 4525=4525
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_free_process ffff810023d7e118
(pid 4525), caller ffffffff8861e048S, ctid 4518
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_free_task ffff81002c160188 (tid
4525), caller ffffffff8861dfb2S, ctid 4518
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_free_process zap
ffff810023d7e118
    Jan 5 15:48:27 hp-dl140g2-01 kernel:
stap_65966cac75d5cb53185970245c71624b_2107: systemtap: 1.7/0.152, base:
ffffffff8862e000, memory: 94data/25text/2ctx/2058net/34alloc kb, probes: 2,
unpriv-uid: 0
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_clone ffff81002bfd7238
4651=4651
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_clone2 ffff81002bfd7238
4651=4651
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_quiesce ffff81002bfd7238
4651=4651
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit ffff81002bfd7238
4651=4654
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_quiesce2
ffff81002bfd7238 4651=4651
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_quiesce3
ffff81002bfd7238 4651=4651
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit1a ffff81002bfd7238
4651=4654
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit2 ffff81002bfd7238
4651=4654
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_free_task ffff81002c6203b8 (tid
4654), caller ffffffff8861fbfcS, ctid 4654
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit3 ffff81002bfd7238
4651=4654
    Jan 5 15:48:27 hp-dl140g2-01 kernel: uprobe_report_exit4 ffff81002bfd7238
4651=4654
    INFO: task stapio:4645 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    stapio D ffff81002b784608 0 4645 4544 4651 4652 (NOTLB)
    ffff81001cc33dc8 0000000000000046 0000000000000000 0000000000000046
    ffff81002bfd7240 0000000000000004 ffff81001ad94640 ffff810037c5e300
    000000967ed8046e 0000000000012432 ffff81001ad94828 000000022c9b70d8
    Call Trace:
    [<ffffffff8861eae3>] :uprobes:__unregister_uprobe+0x1ef/0x22f
    [<ffffffff800a811e>] autoremove_wake_function+0x0/0x2e
    [<ffffffff88632abe>]
:stap_65966cac75d5cb53185970245c71624b_2107:_stp_cleanup_and_exit+0x140/0x311
    [<ffffffff88632dbc>]
:stap_65966cac75d5cb53185970245c71624b_2107:_stp_ctl_write_cmd+0xf1/0xdbd
    [<ffffffff8013d5fd>] file_has_perm+0x48/0xa3
    [<ffffffff80017827>] vfs_write+0xce/0x174
    [<ffffffff8001811b>] sys_write+0x45/0x6e
    [<ffffffff800602a6>] tracesys+0xd5/0xdf

    no locks held by stapio/4645.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2012-01-06 12:25   ` Srikar Dronamraju
@ 2012-01-06 17:01     ` David Smith
  0 siblings, 0 replies; 17+ messages in thread
From: David Smith @ 2012-01-06 17:01 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: Frank Ch Eigler, systemtap

On 01/06/2012 06:25 AM, Srikar Dronamraju wrote:

> 
> While I still cannot see a reason how uprobe_{free,put}_process can 
> race uprobe_report_{exit,exec}, I certainly think somebit of cleanup 
> can be done.  However I am dont think we need to do a utask or uproc lookup
> from the table. Especially in case of callbacks.
> 
> Mostly similar to what Jim proposed.
> I havent tested this patch myself and I couldnt reproduce the problem.


With your patch, I got the same stapio hang as I mentioned in comment #5
on the bug, but happened on the 3rd run of the unprivileged_myproc.exp
test case (it has happened on every run with the pr13539 branch code for
me).  So your code is a slight improvement.

That comment also describes my environment in duplicating the bug which
might help you.

-- 

David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (5 preceding siblings ...)
  2012-01-06 16:57 ` dsmith at redhat dot com
@ 2012-01-06 17:01 ` dsmith at redhat dot com
  2012-01-26 16:10 ` fche at redhat dot com
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2012-01-06 17:01 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

--- Comment #6 from David Smith <dsmith at redhat dot com> 2012-01-06 17:00:33 UTC ---
(In reply to comment #5)
> I've reproduced this oops.  Here's how.  I wasn't able to reproduce this in a
> vm.  Here are the details of the hardware/software I've duplicated this on.
> 
> x86_64
> 2.6.18-301.el5debug
> 4 cpus
> 1Gb memory

I forgot to mention that I run the unprivileged_myproc.exp testcase to show the
problem.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (6 preceding siblings ...)
  2012-01-06 17:01 ` dsmith at redhat dot com
@ 2012-01-26 16:10 ` fche at redhat dot com
  2012-01-26 22:35 ` dsmith at redhat dot com
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2012-01-26 16:10 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #7 from Frank Ch. Eigler <fche at redhat dot com> 2012-01-26 16:09:40 UTC ---
With commit f916107c2, I can no longer reproduce this.
I believe a factor unique to my test machine was its
use of netconsole + serial-console, which may have
caused extra disruption with the debugging printk's
included in my modifications to kenistoj's original
patch, and preserved by srikar.  After those printk's
were removed, the tests have been solid.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (7 preceding siblings ...)
  2012-01-26 16:10 ` fche at redhat dot com
@ 2012-01-26 22:35 ` dsmith at redhat dot com
  2012-01-26 22:58 ` jistone at redhat dot com
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2012-01-26 22:35 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

--- Comment #8 from David Smith <dsmith at redhat dot com> 2012-01-26 22:34:19 UTC ---
When testing this on RHEL6 (2.6.32-220.4.1.el6.x86_64.debug), I just saw the
following (which is still better than an oops):

BUG: MAX_LOCKDEP_KEYS too low!
turning off the locking correctness validator.
Pid: 19728, comm: loop Not tainted 2.6.32-220.4.1.el6.x86_64.debug #1
Call Trace:
 [<ffffffff810ac035>] ? register_lock_class+0x235/0x470
 [<ffffffff81013853>] ? native_sched_clock+0x13/0x60
 [<ffffffff81012d29>] ? sched_clock+0x9/0x10
 [<ffffffff810af3b3>] ? __lock_acquire+0x93/0x1570
 [<ffffffff81013853>] ? native_sched_clock+0x13/0x60
 [<ffffffff81012d29>] ? sched_clock+0x9/0x10
 [<ffffffff810b0934>] ? lock_acquire+0xa4/0x120
 [<ffffffffa07ef31e>] ? stap_uprobe_process_found+0x23e/0x490
[stap_20011319d85]
 [<ffffffff810daf2d>] ? get_utrace_lock+0x6d/0x120
 [<ffffffff810b0b53>] ? lock_release_non_nested+0x1a3/0x3a0
 [<ffffffff8151ea13>] ? __mutex_lock_common+0x43/0x400
 [<ffffffffa07ef31e>] ? stap_uprobe_process_found+0x23e/0x490
[stap_20011319d85]
 [<ffffffff8109e2d8>] ? sched_clock_cpu+0xb8/0x110
 [<ffffffff810ab1ed>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8109e41f>] ? cpu_clock+0x6f/0x80
 [<ffffffffa07ef31e>] ? stap_uprobe_process_found+0x23e/0x490
[stap_20011319d85]
 [<ffffffff810ae30d>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff8151eed8>] ? mutex_lock_nested+0x48/0x60
 [<ffffffffa07ef31e>] ? stap_uprobe_process_found+0x23e/0x490
[stap_20011319d85]
 [<ffffffffa07ef66b>] ? __stp_utrace_task_finder_target_quiesce+0xfb/0x3d0
[sta]
 [<ffffffff810ae30d>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff810d8ed0>] ? start_callback+0xa0/0xd0
 [<ffffffff810d9ee4>] ? utrace_resume+0xf4/0x130
 [<ffffffff8100ab10>] ? do_notify_resume+0xb0/0xc0
 [<ffffffff8100b404>] ? int_signal+0x12/0x17
stap_2d33a1bdbf5bbf70ba121e44b1f08a2_20040: systemtap: 1.7/0.152, base:
fffffff0
stap_d47936fcb49b72c6f17c4fd996210e0_20390: systemtap: 1.7/0.152, base:
fffffff0
stap_e2bb710e788d1876db77d75ca35724e_20630: systemtap: 1.7/0.152, base:
fffffff0
stap_41e233c33c9b9cf9ec0bdcb6f1daade_20871: systemtap: 1.7/0.152, base:
fffffff0
stap_cb77720af8b8c98e8880014b96febaf_21110: systemtap: 1.7/0.152, base:
fffffff0
stap_762ae1764505116c3fb959a594e588a_21350: systemtap: 1.7/0.152, base:
fffffff0
stap_27ab0f045c37da028e10caf504af649_21596: systemtap: 1.7/0.152, base:
fffffff0
stap_78d447984a35de9b21e18cb0cab35a8_21835: systemtap: 1.7/0.152, base:
fffffff0
stap_b1011b2aafe027b8f8c0867b99a24da_22076: systemtap: 1.7/0.152, base:
fffffff0
stap_4a5d3ffcea64eea9bd15945470c6d19_22316: systemtap: 1.7/0.152, base:
fffffff0
stap_2dd15849ef5f69d61b7c9b2fcb1b7c2_22556: systemtap: 1.7/0.152, base:
fffffff0
stap_c9075c8e6803117ec44383c623a08ef_22796: systemtap: 1.7/0.152, base:
fffffff0
stap_321b83fb9252afcfc3639c6d93aa5b7_23036: systemtap: 1.7/0.152, base:
fffffff0
stap_99052e6b872b32efbfb629721235d36_23276: systemtap: 1.7/0.152, base:
fffffff0
stap_6bbcd2573dab9d1076fa179f935c441_23516: systemtap: 1.7/0.152, base:
fffffff0
stap_f9312d8bae04919752574c65d93aaa6_23756: systemtap: 1.7/0.152, base:
fffffff0
stap_de64570653d1a21cdb8e6c0935633c2_23996: systemtap: 1.7/0.152, base:
fffffff0
stap_a61d2757f66e30c0e0218b148ab4dfa_24236: systemtap: 1.7/0.152, base:
fffffff0
stap_170bf3afde0724f92d05a3c35e72b79_24476: systemtap: 1.7/0.152, base:
fffffff0
stap_17df58d1777fa0e55934744cff7673f_24716: systemtap: 1.7/0.152, base:
fffffff0
stap_f875a6812b52f5b77f3351a6dbed1b0_24956: systemtap: 1.7/0.152, base:
fffffff0
stap_a59083aa77b019e7d8eeeb45f46f64b_25196: systemtap: 1.7/0.152, base:
fffffff0
stap_7df023d2a130fbe7e212ddccc89d9c8_25436: systemtap: 1.7/0.152, base:
fffffff0
stap_aef7a14ce57fb62c24fe9270c9cba62_25676: systemtap: 1.7/0.152, base:
fffffff0
stap_12c4b1a6414b56ce6cfb9fc1f3ff866_25916: systemtap: 1.7/0.152, base:
fffffff0
stap_557243b84bb16284e3a5eb72873d0e2_26156: systemtap: 1.7/0.152, base:
fffffff0
stap_6d3f1ee8b730a4b9fd55559391749ce_26171: systemtap: 1.7/0.152, base:
fffffff0
stap_a0319d4181f8150977aec294313305c_26187: systemtap: 1.7/0.152, base:
fffffff0
stap_43f459f3a1511a8dbea7cd536b2692a_26202: systemtap: 1.7/0.152, base:
fffffff0
stap_b5f0b925503679d35fe81ce7ce4bb64_26217: systemtap: 1.7/0.152, base:
fffffff0
stap_f7479031df6151a16052d52ccd3229f_26455: systemtap: 1.7/0.152, base:
fffffff0
stap_629339df3f5443e55e68db6bb99521f_26693: systemtap: 1.7/0.152, base:
fffffff0
stap_b6022e72bbbaeb7d6c901eaf8532954_26955: systemtap: 1.7/0.152, base:
fffffff0
stap_74fb0903402ae53e58589703ffc618e_27192: systemtap: 1.7/0.152, base:
fffffff0
stap_b38a1dc607edc6127338896862aaf24_27429: systemtap: 1.7/0.152, base:
fffffff0
stap_7e61007af13e66b328cd6bcf43e8204_27665: systemtap: 1.7/0.152, base:
fffffff0
stap_3b2a6c6c34b2d015a3239d393b69e6e_27903: systemtap: 1.7/0.152, base:
fffffff0
stap_ac00a3e64561b9e766b68e18a6b374a_28143: systemtap: 1.7/0.152, base:
fffffff0
stap_732af8998c13f2a2d7803cf2784e357_28180: systemtap: 1.7/0.152, base:
fffffff0
stap_301a3f18642f17397c8c1d0641cf2e7_28195: systemtap: 1.7/0.152, base:
fffffff0
stap_4e2eee4b70d50d68a79776223e664d0_28214: systemtap: 1.7/0.152, base:
fffffff0
stap_78dadc54b0bb05aa4da249f2ef6ea44_28228: systemtap: 1.7/0.152, base:
fffffff0
stap_d772048d24ff263be233a905cd2290b_28242: systemtap: 1.7/0.152, base:
fffffff0
stap_3a004dd992ca780e4e3d2f03b02ea7a_28256: systemtap: 1.7/0.152, base:
fffffff0
stap_b529d4af03a64a77b8a2221b010ed62_28270: systemtap: 1.7/0.152, base:
fffffff0
stap_bc381196a6b1333f7469a42d767254f_28284: systemtap: 1.7/0.152, base:
fffffff0
stap_20011319d85d8cb4348b7e4178331ee_28298: systemtap: 1.7/0.152, base:
fffffff0
stap_cb77720af8b8c98e8880014b96febaf_28312: systemtap: 1.7/0.152, base:
fffffff0
stap_762ae1764505116c3fb959a594e588a_28326: systemtap: 1.7/0.152, base:
fffffff0
stap_27ab0f045c37da028e10caf504af649_28345: systemtap: 1.7/0.152, base:
fffffff0
stap_78d447984a35de9b21e18cb0cab35a8_28359: systemtap: 1.7/0.152, base:
fffffff0
stap_b1011b2aafe027b8f8c0867b99a24da_28372: systemtap: 1.7/0.152, base:
fffffff0
stap_4a5d3ffcea64eea9bd15945470c6d19_28387: systemtap: 1.7/0.152, base:
fffffff0
stap_2dd15849ef5f69d61b7c9b2fcb1b7c2_28401: systemtap: 1.7/0.152, base:
fffffff0
stap_c9075c8e6803117ec44383c623a08ef_28415: systemtap: 1.7/0.152, base:
fffffff0
stap_321b83fb9252afcfc3639c6d93aa5b7_28429: systemtap: 1.7/0.152, base:
fffffff0
stap_99052e6b872b32efbfb629721235d36_28443: systemtap: 1.7/0.152, base:
fffffff0
stap_170bf3afde0724f92d05a3c35e72b79_28457: systemtap: 1.7/0.152, base:
fffffff0
stap_17df58d1777fa0e55934744cff7673f_28471: systemtap: 1.7/0.152, base:
fffffff0
stap_f875a6812b52f5b77f3351a6dbed1b0_28485: systemtap: 1.7/0.152, base:
fffffff0
stap_a59083aa77b019e7d8eeeb45f46f64b_28498: systemtap: 1.7/0.152, base:
fffffff0
stap_6bbcd2573dab9d1076fa179f935c441_28512: systemtap: 1.7/0.152, base:
fffffff0
stap_f9312d8bae04919752574c65d93aaa6_28527: systemtap: 1.7/0.152, base:
fffffff0
stap_de64570653d1a21cdb8e6c0935633c2_28600: systemtap: 1.7/0.152, base:
fffffff0
stap_a61d2757f66e30c0e0218b148ab4dfa_28614: systemtap: 1.7/0.152, base:
fffffff0
stap_aef7a14ce57fb62c24fe9270c9cba62_28687: systemtap: 1.7/0.152, base:
fffffff0
stap_7df023d2a130fbe7e212ddccc89d9c8_28701: systemtap: 1.7/0.152, base:
fffffff0
stap_12c4b1a6414b56ce6cfb9fc1f3ff866_28715: systemtap: 1.7/0.152, base:
fffffff0
stap_557243b84bb16284e3a5eb72873d0e2_28729: systemtap: 1.7/0.152, base:
fffffff0
stap_2d33a1bdbf5bbf70ba121e44b1f08a2_28743: systemtap: 1.7/0.152, base:
fffffff0
stap_d47936fcb49b72c6f17c4fd996210e0_28756: systemtap: 1.7/0.152, base:
fffffff0
stap_e2bb710e788d1876db77d75ca35724e_28771: systemtap: 1.7/0.152, base:
fffffff0
stap_41e233c33c9b9cf9ec0bdcb6f1daade_28785: systemtap: 1.7/0.152, base:
fffffff0
stap_d7a432da661a2367de2ecb0ca05d4e8_29023: systemtap: 1.7/0.152, base:
fffffff0
stap_a2708d6f29c6d53d87bd634031edeaf_29261: systemtap: 1.7/0.152, base:
fffffff0
stap_e6d078710d6700099e0ae59899e0783_29499: systemtap: 1.7/0.152, base:
fffffff0
stap_55ae3daf2a24145e8209bbdcdd042bf_29737: systemtap: 1.7/0.152, base:
fffffff0
stap_b6364a1497ea32d833fc4c2ef0c7526_29751: systemtap: 1.7/0.152, base:
fffffff0
stap_3114069fffb5b23274a5e31a9b9f763_29769: systemtap: 1.7/0.152, base:
fffffff0
stap_717eca9f7c6fbe5b7197ea9d96f7d36_29785: systemtap: 1.7/0.152, base:
fffffff0
stap_f35cb654815bfacf1b53db536249c3d_30025: systemtap: 1.7/0.152, base:
fffffff0
stap_d58904f2d11490984a1991d72f0014c_30265: systemtap: 1.7/0.152, base:
fffffff0
stap_3b9f0318213a3770a24162cfa1cdd30_30505: systemtap: 1.7/0.152, base:
fffffff0
stap_18e7ad77a58b066fff5d3b294b87c6e_30750: systemtap: 1.7/0.152, base:
fffffff0
stap_d571d7bb144cba8957ae1bcb4669f33_30993: systemtap: 1.7/0.152, base:
fffffff0
stap_a9b688a7ae856a1f12e8a3b6a320816_31233: systemtap: 1.7/0.152, base:
fffffff0
stap_35313cc5cfd1576ad558cd605bbd3c5_31473: systemtap: 1.7/0.152, base:
fffffff0
stap_7d24008e926c66c9b8b23fb1eea9a42_31713: systemtap: 1.7/0.152, base:
fffffff0
stap_6f97fd16b37afc021f6a9d8987f55c2_31953: systemtap: 1.7/0.152, base:
fffffff0
stap_02a5bf75d8291ca8c7df40cdcb5ec63_32193: systemtap: 1.7/0.152, base:
fffffff0
stap_1052840ca67f5d8427d4249b5962474_32433: systemtap: 1.7/0.152, base:
fffffff0
stap_dafb5f8c9066a5e9a49b76f9df8ab4b_32673: systemtap: 1.7/0.152, base:
fffffff0
stap_70cb6f31ed05fc0b69874a2f63dd2c10__456: systemtap: 1.7/0.152, base:
fffffff0
stap_932473e387bdcb45a4dcff1a25f62608__698: systemtap: 1.7/0.152, base:
fffffff0
stap_f94209c863a0d68af4e03b5dbe2155c6__950: systemtap: 1.7/0.152, base:
fffffff0
stap_ca2ef59fb68efa71453027a809906c94_1192: systemtap: 1.7/0.152, base:
fffffff0
stap_2c39bff475a8657f5a2e963379dea928_1453: systemtap: 1.7/0.152, base:
fffffff0
stap_9d18f77aa93f20dcef4a81e7b0185d66_1693: systemtap: 1.7/0.152, base:
fffffff0
stap_1718dfe05bcce6db5c88568fbac33f3b_1933: systemtap: 1.7/0.152, base:
fffffff0
stap_7764bcd6be8045b5242689b234ae8952_2173: systemtap: 1.7/0.152, base:
fffffff0
stap_a23af06f6cba198055c07faae5e3acc0_2413: systemtap: 1.7/0.152, base:
fffffff0
stap_d12e94a0315d2d29f57ad01287183200_2653: systemtap: 1.7/0.152, base:
fffffff0
stap_7f7ad5d649d1ac59dd9f6b188aac24d3_2895: systemtap: 1.7/0.152, base:
fffffff0
stap_bf816428d4c61cdd915b86a56be23799_3135: systemtap: 1.7/0.152, base:
fffffff0
stap_dd848112b6e84563ef43c020064451a9_3375: systemtap: 1.7/0.152, base:
fffffff0
stap_c8f75b93eecfffd05f7f1e31e053b796_3615: systemtap: 1.7/0.152, base:
fffffff0
stap_ea1ee8fa7035b9bc6913ef4ac1960fd9_3855: systemtap: 1.7/0.152, base:
fffffff0
stap_b5336ff09c0a2a29677a991d7bbcbb70_4095: systemtap: 1.7/0.152, base:
fffffff0
stap_44124397a66dfd69d5af4e886f22224f_4335: systemtap: 1.7/0.152, base:
fffffff0
stap_66d6c0adc18adb07eb2117137347b0a8_4575: systemtap: 1.7/0.152, base:
fffffff0
stap_261ed1e4b3bf7ce6d4482195fffcaa5f_4821: systemtap: 1.7/0.152, base:
fffffff0
stap_b9f2abf65f4dbd2f2b0cb728149c0be0_5075: systemtap: 1.7/0.152, base:
fffffff0
stap_dd4da8a2ae85219b4b14323dc3455988_5332: systemtap: 1.7/0.152, base:
fffffff0
stap_f28638e99f4a3a45042f9995524b84e9_5572: systemtap: 1.7/0.152, base:
fffffff0
stap_10a753790a79f58f0c11b06df9de0918_5814: systemtap: 1.7/0.152, base:
fffffff0
stap_3ff8332830e45e9128f8fec2a70598b1_6054: systemtap: 1.7/0.152, base:
fffffff0
stap_9e69509a0ca707d6da2b713f5f84dd62_6069: systemtap: 1.7/0.152, base:
fffffff0
stap_36c4803dd014be3dd2526c1ddce8bfc4_6084: systemtap: 1.7/0.152, base:
fffffff0
stap_bb10960fc043f10c4ebc85fbf73476d8_6100: systemtap: 1.7/0.152, base:
fffffff0
stap_490246a73d369a832ad59df4970d9b72_6115: systemtap: 1.7/0.152, base:
fffffff0
stap_20dcba22ef9427e76602d3d81cf3f827_6352: systemtap: 1.7/0.152, base:
fffffff0
stap_9bd335dc6e85e4ba962a94cafa3e6478_6591: systemtap: 1.7/0.152, base:
fffffff0
stap_a1a390da0ee6e05ae9b8d296e83ef436_6853: systemtap: 1.7/0.152, base:
fffffff0
stap_f3539924c35cbbb1c41a494499449103_7089: systemtap: 1.7/0.152, base:
fffffff0
stap_ba2736e4411e2fc5f491e0701a9bd6e7_7325: systemtap: 1.7/0.152, base:
fffffff0
stap_ff3c01b7e8524d5026be621f66733171_7562: systemtap: 1.7/0.152, base:
fffffff0
stap_2de7eb12df855cfbf00871a812fb6ee0_7799: systemtap: 1.7/0.152, base:
fffffff0
stap_d2ea48b6ca27e7031473951f2de84aac_8035: systemtap: 1.7/0.152, base:
fffffff0
stap_340e7940782fb1c5c0261368ca27a438_8272: systemtap: 1.7/0.152, base:
fffffff0
stap_6d0e93bc133de4cea3aa3fe1305a9fe4_8512: systemtap: 1.7/0.152, base:
fffffff0
stap_d58904f2d11490984a1991d72f0014cb_8550: systemtap: 1.7/0.152, base:
fffffff0
stap_3b9f0318213a3770a24162cfa1cdd30f_8564: systemtap: 1.7/0.152, base:
fffffff0
stap_18e7ad77a58b066fff5d3b294b87c6e6_8584: systemtap: 1.7/0.152, base:
fffffff0
stap_d571d7bb144cba8957ae1bcb4669f330_8597: systemtap: 1.7/0.152, base:
fffffff0
stap_f35cb654815bfacf1b53db536249c3d8_8612: systemtap: 1.7/0.152, base:
fffffff0
stap_a9b688a7ae856a1f12e8a3b6a320816d_8625: systemtap: 1.7/0.152, base:
fffffff0
stap_35313cc5cfd1576ad558cd605bbd3c5e_8640: systemtap: 1.7/0.152, base:
fffffff0
stap_6f97fd16b37afc021f6a9d8987f55c25_8654: systemtap: 1.7/0.152, base:
fffffff0
stap_02a5bf75d8291ca8c7df40cdcb5ec63a_8668: systemtap: 1.7/0.152, base:
fffffff0
stap_fecc97de8277719dad6fc9c0ad2f2ae1_8906: systemtap: 1.7/0.152, base:
fffffff0
stap_ac0136dae229e1c50610fa2156bdc5c0_9129: systemtap: 1.7/0.152, base:
fffffff0
stap_f94209c863a0d68af4e03b5dbe2155c6_9138: systemtap: 1.7/0.152, base:
fffffff0
stap_ca2ef59fb68efa71453027a809906c94_9147: systemtap: 1.7/0.152, base:
fffffff0
stap_2c39bff475a8657f5a2e963379dea928_9160: systemtap: 1.7/0.152, base:
fffffff0
stap_9d18f77aa93f20dcef4a81e7b0185d66_9169: systemtap: 1.7/0.152, base:
fffffff0
stap_1718dfe05bcce6db5c88568fbac33f3b_9178: systemtap: 1.7/0.152, base:
fffffff0
stap_7764bcd6be8045b5242689b234ae8952_9187: systemtap: 1.7/0.152, base:
fffffff0
stap_a23af06f6cba198055c07faae5e3acc0_9196: systemtap: 1.7/0.152, base:
fffffff0
stap_d12e94a0315d2d29f57ad01287183200_9205: systemtap: 1.7/0.152, base:
fffffff0
stap_7f7ad5d649d1ac59dd9f6b188aac24d3_9214: systemtap: 1.7/0.152, base:
fffffff0
stap_bf816428d4c61cdd915b86a56be23799_9223: systemtap: 1.7/0.152, base:
fffffff0
stap_44124397a66dfd69d5af4e886f22224f_9232: systemtap: 1.7/0.152, base:
fffffff0
stap_66d6c0adc18adb07eb2117137347b0a8_9241: systemtap: 1.7/0.152, base:
fffffff0
stap_261ed1e4b3bf7ce6d4482195fffcaa5f_9250: systemtap: 1.7/0.152, base:
fffffff0
stap_b9f2abf65f4dbd2f2b0cb728149c0be0_9259: systemtap: 1.7/0.152, base:
fffffff0
stap_dd848112b6e84563ef43c020064451a9_9268: systemtap: 1.7/0.152, base:
fffffff0
stap_c8f75b93eecfffd05f7f1e31e053b796_9277: systemtap: 1.7/0.152, base:
fffffff0
stap_ea1ee8fa7035b9bc6913ef4ac1960fd9_9286: systemtap: 1.7/0.152, base:
fffffff0

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (8 preceding siblings ...)
  2012-01-26 22:35 ` dsmith at redhat dot com
@ 2012-01-26 22:58 ` jistone at redhat dot com
  2012-01-27 14:52 ` dsmith at redhat dot com
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: jistone at redhat dot com @ 2012-01-26 22:58 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

Josh Stone <jistone at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jistone at redhat dot com

--- Comment #9 from Josh Stone <jistone at redhat dot com> 2012-01-26 22:57:36 UTC ---
(In reply to comment #8)
> When testing this on RHEL6 (2.6.32-220.4.1.el6.x86_64.debug), I just saw the
> following (which is still better than an oops):
> 
> BUG: MAX_LOCKDEP_KEYS too low!
> turning off the locking correctness validator.

See the last section in linux/Documentation/lockdep-design.txt -- in summary,
the leading causes are (1) every module load/unload leaks lock classes, and (2)
statically initialized locks create separate classes.

It might be possible for us to reduce the latter effect, but we can't help the
former.  Either way, it's not any sign that we have a problem, just that we've
caused lockdep to exceed its internal maximum.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (9 preceding siblings ...)
  2012-01-26 22:58 ` jistone at redhat dot com
@ 2012-01-27 14:52 ` dsmith at redhat dot com
  2012-01-27 15:10 ` dsmith at redhat dot com
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2012-01-27 14:52 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

--- Comment #10 from David Smith <dsmith at redhat dot com> 2012-01-27 14:51:13 UTC ---
(In reply to comment #9)
> (In reply to comment #8)
> > When testing this on RHEL6 (2.6.32-220.4.1.el6.x86_64.debug), I just saw the
> > following (which is still better than an oops):
> > 
> > BUG: MAX_LOCKDEP_KEYS too low!
> > turning off the locking correctness validator.
> 
> See the last section in linux/Documentation/lockdep-design.txt -- in summary,
> the leading causes are (1) every module load/unload leaks lock classes, and (2)
> statically initialized locks create separate classes.
> 
> It might be possible for us to reduce the latter effect, but we can't help the
> former.  Either way, it's not any sign that we have a problem, just that we've
> caused lockdep to exceed its internal maximum.

Right.

If that list of systemtap modules is loaded concurrently, I think I'll need to
look at the testcase.  I'd bet that wasn't intended.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (10 preceding siblings ...)
  2012-01-27 14:52 ` dsmith at redhat dot com
@ 2012-01-27 15:10 ` dsmith at redhat dot com
  2012-01-27 16:20 ` jistone at redhat dot com
  2012-01-30 22:05 ` fche at redhat dot com
  13 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2012-01-27 15:10 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

--- Comment #11 from David Smith <dsmith at redhat dot com> 2012-01-27 15:09:46 UTC ---
I ran the unprivileged_myproc.exp in a loop last night on 2 borrowed systems:

1) RHEL5 (2.6.18-305.el5debug - x86_64) - Each test run takes normally around
13-15 minutes to complete on this system.  But, this system only made it
through 13 iterations of the test (in around 10 hours).  Looking through the
/var/log/messages, I see this:

Jan 27 04:05:23 hp-sl2x170zg6-02 kernel: INFO: task stapio:16032 blocked for
mor
e than 120 seconds.
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel: "echo 0 >
/proc/sys/kernel/hung_task_ti
meout_secs" disables this message.
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel: stapio        D ffff8101bcd64088     0 
16032  16029         16038       (NOTLB)
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  ffff81018fe09dc8 0000000000000046
0000
000000000000 0000000000000046
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  ffff8101bce008d0 0000000000000007
ffff
81018fc3e080 ffff8101bd0cc600
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  00001edff65e7937 000000000000178f
ffff
81018fc3e268 000000018be2c720
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel: Call Trace:
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  [<ffffffff88657c6e>]
:uprobes:__unregi
ster_uprobe+0x1ef/0x22f
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  [<ffffffff800a811e>]
autoremove_wake_f
unction+0x0/0x2e
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  [<ffffffff8866babe>]
:stap_9846269314f
b177561661c8b9fe159f3_2043:_stp_cleanup_and_exit+0x140/0x311
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  [<ffffffff8866bdbc>]
:stap_9846269314f
b177561661c8b9fe159f3_2043:_stp_ctl_write_cmd+0xf1/0xdbd
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  [<ffffffff8013d5fb>]
file_has_perm+0x4
8/0xa3
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  [<ffffffff80017800>]
vfs_write+0xce/0x
174
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  [<ffffffff800180f4>]
sys_write+0x45/0x
6e
Jan 27 04:05:23 hp-sl2x170zg6-02 kernel:  [<ffffffff800602a6>]
tracesys+0xd5/0xd
f

It is possible this is a separate problem.

2) On RHEL6 (2.6.32-220.4.1.el6.x86_64.debug), the system did 57 iterations
with no OOPS.  However, I do see the following in /var/log/messages:

Jan 27 01:41:01 dell-pe1955-01 kernel: Slab corruption (Not tainted):
utrace_engine start=ffff880037ae5c88, len=56
Jan 27 01:41:01 dell-pe1955-01 kernel: Redzone:
0x9f911029d74e35b/0x9f911029d74e35b.
Jan 27 01:41:01 dell-pe1955-01 kernel: Last user:
[<ffffffff810d9179>](__utrace_engine_release+0x39/0x50)
Jan 27 01:41:01 dell-pe1955-01 kernel: 020: e0 9c 54 81 ff ff ff ff 6b 6b 6b 6b
6b 6b 6b 6b
Jan 27 01:41:01 dell-pe1955-01 kernel: 030: 01 00 00 00 00 00 00 00
Jan 27 01:41:01 dell-pe1955-01 kernel: Prev obj: start=ffff880037ae5c38, len=56
Jan 27 01:41:01 dell-pe1955-01 kernel: Redzone:
0xd84156c5635688c0/0xd84156c5635688c0.
Jan 27 01:41:01 dell-pe1955-01 kernel: Last user:
[<ffffffff810d926a>](utrace_attach_task+0xba/0x2b0)
Jan 27 01:41:01 dell-pe1955-01 kernel: 000: 01 00 00 00 5a 5a 5a 5a 00 00 00 00
00 00 00 00
Jan 27 01:41:01 dell-pe1955-01 kernel: 010: 08 86 7a c7 00 88 ff ff e8 8c 7a c7
00 88 ff ff
Jan 27 01:41:02 dell-pe1955-01 kernel: Next obj: start=ffff880037ae5cd8, len=56
Jan 27 01:41:02 dell-pe1955-01 kernel: Redzone:
0xd84156c5635688c0/0xd84156c5635688c0.
Jan 27 01:41:02 dell-pe1955-01 kernel: Last user:
[<ffffffff810d926a>](utrace_attach_task+0xba/0x2b0)
Jan 27 01:41:02 dell-pe1955-01 kernel: 000: 01 00 00 00 5a 5a 5a 5a 00 00 00 00
00 00 00 00
Jan 27 01:41:02 dell-pe1955-01 kernel: 010: e8 5c ae 37 00 88 ff ff e8 5c ae 37
00 88 ff ff

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (11 preceding siblings ...)
  2012-01-27 15:10 ` dsmith at redhat dot com
@ 2012-01-27 16:20 ` jistone at redhat dot com
  2012-01-30 22:05 ` fche at redhat dot com
  13 siblings, 0 replies; 17+ messages in thread
From: jistone at redhat dot com @ 2012-01-27 16:20 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

--- Comment #12 from Josh Stone <jistone at redhat dot com> 2012-01-27 16:19:56 UTC ---
(In reply to comment #10)
> If that list of systemtap modules is loaded concurrently, I think I'll need to
> look at the testcase.  I'd bet that wasn't intended.

In my understanding of that lockdep documentation, it's not concurrency that
matters.  Even a single module repeatedly loaded and unloaded will leak and
eventually exhaust the lockdep classes.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131
  2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
                   ` (12 preceding siblings ...)
  2012-01-27 16:20 ` jistone at redhat dot com
@ 2012-01-30 22:05 ` fche at redhat dot com
  13 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2012-01-30 22:05 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=13539

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |13634

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-01-30 22:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-21 22:38 [Bug uprobes/13539] New: occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 fche at redhat dot com
2011-12-21 22:59 ` [Bug uprobes/13539] " fche at redhat dot com
2011-12-22  1:38 ` fche at redhat dot com
2011-12-22 12:20 ` fche at redhat dot com
2011-12-31 14:27 ` fche at redhat dot com
2012-01-02 22:31 ` fche at redhat dot com
2012-01-06 12:25   ` Srikar Dronamraju
2012-01-06 17:01     ` David Smith
2012-01-06 16:57 ` dsmith at redhat dot com
2012-01-06 17:01 ` dsmith at redhat dot com
2012-01-26 16:10 ` fche at redhat dot com
2012-01-26 22:35 ` dsmith at redhat dot com
2012-01-26 22:58 ` jistone at redhat dot com
2012-01-27 14:52 ` dsmith at redhat dot com
2012-01-27 15:10 ` dsmith at redhat dot com
2012-01-27 16:20 ` jistone at redhat dot com
2012-01-30 22:05 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).