From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 413 invoked by alias); 16 Nov 2018 04:56:50 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 380 invoked by uid 48); 16 Nov 2018 04:56:45 -0000 From: "datong at openresty dot com" To: systemtap@sourceware.org Subject: [Bug translator/23891] New: stap and stapio process are stuck in signal processor and could not terminate properly Date: Fri, 16 Nov 2018 04:56:00 -0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: translator X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: datong at openresty dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2018-q4/txt/msg00104.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D23891 Bug ID: 23891 Summary: stap and stapio process are stuck in signal processor and could not terminate properly Product: systemtap Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: translator Assignee: systemtap at sourceware dot org Reporter: datong at openresty dot com Target Milestone: --- Hi, We have observed that sometimes the stap (and associated stapio) process co= uld not be terminated using SIGTERM on one of our production systems: strace shows the following: + timeout 5s strace -p 8596 -s 1024 Process 8596 attached write(2, "Too many interrupts received, exiting.\n", 39Process 8596 detached + true + timeout 5s strace -p 9110 -s 1024 Process 9110 attached write(2, "WARNING:", 8Process 9110 detached + true The full backtrace of the stap and stapio process: + gstack 8596 #0 0x00007f8cc36bc420 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000412308 in handle_interrupt () at main.cxx:280 #2 #3 0x00007f8cc36bceca in __libc_waitpid (pid=3D9110, stat_loc=3D0x7ffe7df2= 0abc, options=3D0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 #4 0x0000000000597d8a in stap_waitpid (verbose=3D0, pid=3D9110) at util.cx= x:749 #5 0x0000000000606cd9 in direct::finish (this=3D0x23fc830) at remote.cxx:1= 08 #6 0x0000000000603222 in remote::run (remotes=3Dstd::vector of length 1, capacity 1 =3D {...}) at remote.cxx:1292 #7 0x00000000004126cc in pass_5 (s=3D..., targets=3Dstd::vector of length = 1, capacity 1 =3D {...}) at main.cxx:1209 #8 0x000000000040fa54 in main (argc=3D, argv=3D) at main.cxx:1429 + gstack 9110 Thread 3 (Thread 0x7f2ad693d700 (LWP 9112)): #0 0x00007f2ad6d0e101 in do_sigwait (sig=3D0x7f2ad693cef4, set=3D) at ../sysdeps/unix/sysv/linux/sigwait.c:61 #1 __sigwait (set=3Dset@entry=3D0x628170, sig=3Dsig@entry=3D0x7f2ad693cef4= ) at ../sysdeps/unix/sysv/linux/sigwait.c:99 #2 0x0000000000402db5 in signal_thread (arg=3D0x628170) at mainloop.c:40 #3 0x00007f2ad6d06dc5 in start_thread (arg=3D0x7f2ad693d700) at pthread_create.c:308 #4 0x00007f2ad6a3573d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 2 (Thread 0x7f2ad613c700 (LWP 9113)): #0 0x00007f2ad6d0d43d in write () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000407618 in reader_thread (data=3D0x0) at relay.c:235 #2 0x00007f2ad6d06dc5 in start_thread (arg=3D0x7f2ad613c700) at pthread_create.c:308 #3 0x00007f2ad6a3573d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 1 (Thread 0x7f2ad712b740 (LWP 9110)): #0 0x00007f2ad6a26c7d in write () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f2ad69b3f73 in _IO_new_file_write (f=3D0x7f2ad6cf91c0 <_IO_2_1_stderr_>, data=3D0x7ffe0aaa9210, n=3D8) at fileops.c:1301 #2 0x00007f2ad69b470f in new_do_write (to_do=3D8, data=3D0x7ffe0aaa9210 "WARNING:acing...\n", fp=3D0x7f2ad6cf91c0 <_IO_2_1_stderr_>) at fileops.c:5= 37 #3 _IO_new_file_xsputn (f=3D0x7f2ad6cf91c0 <_IO_2_1_stderr_>, data=3D, n=3D8) at fileops.c:1383 #4 0x00007f2ad698a47d in buffered_vfprintf (s=3Ds@entry=3D0x7f2ad6cf91c0 <_IO_2_1_stderr_>, format=3Dformat@entry=3D0x40a648 "WARNING:", args=3Dargs@entry=3D0x7ffe0aaab850) at vfprintf.c:2340 #5 0x00007f2ad698531e in _IO_vfprintf_internal (s=3Ds@entry=3D0x7f2ad6cf91= c0 <_IO_2_1_stderr_>, format=3D0x40a648 "WARNING:", ap=3D0x7ffe0aaab850) at vfprintf.c:1289 #6 0x00007f2ad6a4aefd in ___vfprintf_chk (fp=3D0x7f2ad6cf91c0 <_IO_2_1_std= err_>, flag=3Dflag@entry=3D1, format=3D, ap=3Dap@entry=3D0x7ffe0aaa= b850) at vfprintf_chk.c:34 #7 0x0000000000405172 in vfprintf (__ap=3D0x7ffe0aaab850, __fmt=3D, __stream=3D) at /usr/include/bits/stdio2.h:127 #8 eprintf (fmt=3D) at common.c:693 #9 0x0000000000404c55 in stp_main_loop () at mainloop.c:810 #10 0x000000000040279d in main (argc=3D, argv=3D0x7ffe0aaadc= 68) at stapio.c:73 At this point, sending SIGTERM to neither of the process works, and the Ker= nel module remains loaded. Best, Datong --=20 You are receiving this mail because: You are the assignee for the bug.