From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13896 invoked by alias); 2 Jan 2012 22:31:30 -0000 Received: (qmail 13886 invoked by uid 22791); 2 Jan 2012 22:31:29 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO sourceware.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 02 Jan 2012 22:31:15 +0000 From: "fche at redhat dot com" To: systemtap@sourceware.org Subject: [Bug uprobes/13539] occasional oops, kernel SEGV, RHEL5, :uprobes:uprobe_free_process+0xba/0x131 Date: Mon, 02 Jan 2012 22:31:00 -0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: uprobes X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: fche at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Priority Severity Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2012-q1/txt/msg00000.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=13539 Frank Ch. Eigler changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P1 |P2 Severity|critical |normal --- Comment #4 from Frank Ch. Eigler 2012-01-02 22:30:45 UTC --- Some code from Jim Keniston, later adapted by yours truly, makes some progress on the race conditions present in runtime/uprobes{,2}. It seems like something more dramatic will be required. I pushed my current working changes to the new branch pr13539. Test it with # make installcheck RUNTESTFLAGS=unprivileged_myproc.exp on an SMP (virtual?) machine. I can reproduce some problem or another on a rhel5 2.6.18-300{,debug}.{i686,x86_64} box easily, and less easily on rhel6/fedoras. (On the latter, run it in a loop.) The original race condition was that the "./loop 1" program's threads killed themselves right around the same time as the stap module decides to unregister probes due to the probe handler's { exit() }. The effect is that the suicide uprobe_report_* callbacks race with uprobe_free_{task,process} coming in from uprobe_put_process. One of them ends up deallocating the uprobe_proc struct, the other ends up trying to take a semaphore, or muck with a hlist node, in the resulting freed block. The current status of the pr13539 branch works around some of the various possible races, but now gets stuck in the post-exit (?) utrace-quiesce loop of the target "./loop 1" process: [1480353.094558] stap_3960d10ec2d1cdbbd5924a89713e08c4_2157: systemtap: 1.7/0.152, base: ffffffff88740000, memory: 94data/25text/2ctx/2058net/34alloc kb, probes: 2, unpriv-uid: 0 [1480353.107405] uprobe_report_clone ffff81003c627138 14025=14025 [1480353.118383] uprobe_report_clone2 ffff81003c627138 14025=14025 [1480353.122169] uprobe_report_exit ffff81003c627138 14025=14028 [1480353.125266] uprobe_report_quiesce ffff81003c627138 14025=14025 [1480353.128373] uprobe_report_quiesce2 ffff81003c627138 14025=14025 [1480353.130829] uprobe_report_quiesce3 ffff81003c627138 14025=14025 [1480353.133212] uprobe_report_exit1a ffff81003c627138 14025=14028 [1480353.135620] uprobe_report_exit2 ffff81003c627138 14025=14028 [1480353.138275] uprobe_free_task ffff81000f69aa48 (tid 14028), caller ffffffff88718bfcS, ctid 14028 [1480353.142031] uprobe_report_exit3 ffff81003c627138 14025=14028 [1480353.144330] uprobe_report_exit4 ffff81003c627138 14025=14028 [1480353.157461] uprobe_free_process ffff81003c627138 (pid 14025), caller ffffffff88717048S, ctid 14028 [1480353.161439] uprobe_free_task ffff81000f69a5e8 (tid 14025), caller ffffffff88716fb2S, ctid 14028 [1480353.165132] uprobe_free_process zap ffff81003c627138 [sysrq-t sez: ...] [1486034.240725] stap X ffff8100131a9588 0 13933 12033 (L-TLB) [1486034.245729] ffff810010a0df08 0000000000000046 ffff810019064d60 0000000000000246 [1486034.263375] ffff810013733e70 0000000000000009 ffff810019064700 ffffffff8032ed40 [1486034.269112] 0005425fab97e401 00000000007b7869 ffff8100190648e8 0000000013733e60 [1486034.273530] Call Trace: [1486034.276317] [] check_dead_utrace+0x11c/0x185 [1486034.278710] [] do_exit+0x96c/0x978 [1486034.280831] [] debug_mutex_init+0x0/0x3b [1486034.283114] [] tracesys+0xd5/0xdf [1486034.285251] [1486034.286547] loop t ffff8100218fb148 0 14025 1 3882 (NOTLB) [1486034.301736] ffff810010a17cf8 0000000000000046 0000000000000246 ffffffff802a3700 [1486034.306797] ffffffff8871d0a0 0000000000000007 ffff81000fbe25c0 ffff810014d54340 [1486034.311027] 0005425f92a59fff 0000000000796f2c ffff81000fbe27a8 0000000200000001 [1486034.314436] Call Trace: [1486034.317175] [] utrace_quiescent+0xe6/0x26d [1486034.320411] [] utrace_get_signal+0x4f8/0x55b [1486034.323536] [] get_signal_to_deliver+0x5a/0x4b9 [1486034.326940] [] get_signal_to_deliver+0x186/0x4b9 [1486034.339013] [] do_notify_resume+0x9c/0x7b0 [1486034.341901] [] default_wake_function+0x0/0xe [1486034.350764] [] do_fork+0x148/0x1c1 [1486034.353004] [] trace_hardirqs_off_thunk+0x35/0x67 [1486034.355512] [] int_signal+0x12/0x17 -- Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.