From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18712 invoked by alias); 3 Aug 2010 12:27:20 -0000 Mailing-List: contact archer-help@sourceware.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Subscribe: List-Id: Received: (qmail 18689 invoked by uid 22791); 3 Aug 2010 12:27:13 -0000 X-SWARE-Spam-Status: No, hits=-6.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Date: Tue, 03 Aug 2010 12:27:00 -0000 From: Oleg Nesterov To: Jan Kratochvil Cc: Roland McGrath , archer@sourceware.org, utrace-devel@redhat.com Subject: Q: %Stop && gdb crash Message-ID: <20100803122434.GA32698@redhat.com> References: <20100716205147.GA26313@redhat.com> <20100721170400.GA30978@redhat.com> <20100721204203.D040C400B6@magilla.sf.frob.com> <20100723173134.GA29717@redhat.com> <20100726142759.GA17171@redhat.com> <20100728181702.GA26678@redhat.com> <20100802235358.GA9720@host1.dyn.jankratochvil.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100802235358.GA9720@host1.dyn.jankratochvil.net> User-Agent: Mutt/1.5.18 (2008-05-17) X-SW-Source: 2010-q3/txt/msg00078.txt.bz2 On 08/03, Jan Kratochvil wrote: > > On Wed, 28 Jul 2010 20:17:02 +0200, Oleg Nesterov wrote: > > > Btw, gdb crashes very often right after > > > > (gdb) set target-async on > > (gdb) set non-stop > > (gdb) file mt-program > > (gdb) target extended-remote :port > > (gdb) attach its_pid > > > > I didn't even try to investigate (this doesn't happen when > > it works with the real gdbserver). Just retry, gdb is buggy. ^^^^^^^^^^^^ Yes, I still think gdb is wrong, but please correct me. > Trying it with both /bin/sleep and a threaded testcase and I never got a crash > (kernel-2.6.33.6-147.fc13.x86_64 as both host and KVM guest OS). To clarify, let me repeat: I never saw such a crash with the real gdbserver, but this often happens in my testing. I think I understand what happens. And this leads to the question about the %Stop notifications which I was going to delay, see below. I just reproduced the crash. I entered the following commands via CLI interface: (gdb) set target-async on (gdb) set non-stop (gdb) target extended-remote :2000 (gdb) file mt Everything is OK so far. "mt" is not interesting, just the simple application with 4 sleeping threads. Then gdb crashes during attach: (gdb) attach 24291 Attached to process 24291 [New Thread 24291.24291] [New Thread 24291.24292] [New Thread 24291.24293] [New Thread 24291.24294] Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x000000375faf21ce in __lll_lock_wait_private () from /lib64/libc.so.6 (gdb) [Thread 24291.24293] #3 stopped. 0x000000375faf21ce in __lll_lock_wait_private () from /lib64/libc.so.6 [Thread 24291.24292] #2 stopped. 0x000000375fad65cb in read () from /lib64/libc.so.6 [Thread 24291.24291] #1 stopped. 0x00000033af60e57d in pause () from /lib64/libpthread.so.0 inline-frame.c:335: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed. A problem internal to GDB has been detected, And I think this is because of %Stop issues. >From gdb.info Because the notification mechanism is unreliable, the stub is permitted to resend a stop reply notification if it believes GDB may not have received it. GDB ignores additional stop reply notifications received before it has finished processing a previous notification and the stub has completed sending any queued stop events. So I assumed it is always safe to resend the notification unless gdb already sent vStopped. Since it is not clear to me when it makes sense to resend it, currently gdbstub does re-send every time /proc/ugdb reports the new event (T00 in this case). I agree this is not optimal, but this looks correct to me. However, gdb.info also states: Only one stop reply notification at a time may be pending; if additional stop events occur before GDB has acknowledged the previous notification, they must be queued by the stub for later synchronous transmission in response to `vStopped' packets from GDB. That is why gdbstub re-sends the same notification, until it gets vStopped. Now let's look into the log: => vAttach;5ee3 <= OK => qfThreadInfo <= mp5ee3.5ee3,p5ee3.5ee4,p5ee3.5ee5,p5ee3.5ee6 => qsThreadInfo <= l => Hgp5ee3.5ee3 <= OK => vCont? <= vCont;t;c;C;s;S => vCont;t:p5ee3.-1 <= OK Note: gdbstub reports OK before any thread actually stops. I believe this is correct from the remote protocol pov, and this is what we want. <= Stop:T00thread:p5ee3.5ee4; Some thread actually stops, we sent the notification. <= Stop:T00thread:p5ee3.5ee4; Another threads stops, gdbstub resends the same notification according to the docs above (or according to my understanding). Note: this doesn't happen _every time_. In the more likely case all threads are already stopped when ->poll() succeeds. But sometimes some thread stops a little bit later. Once again, please note that both notifications are the same thing, but I guess gdb doesn't understand this, see below. Then, => vStopped <= T00thread:p5ee3.5ee3; => vStopped <= T00thread:p5ee3.5ee6; => vStopped <= T00thread:p5ee3.5ee5; => vStopped <= OK => Hgp5ee3.5ee4 <= OK => g <= 00feffffffffffffa066d65f37000000ffffffffffffffff00040000000... [...snip a lot of $m packets ] everything is fine so far. Then, => vCont;t:p5ee3.-1 <= OK Well. I hope this 'OK' without the subsequent notifications matches the documentation: vCont[;ACTION[:THREAD-ID]]... ... The `t' action is only relevant in non-stop mode ... A stop reply should be generated for any affected thread not already stopped. IIUC, "already stopped" means "already reported as stopped to gdb". So gdbstub replies 'OK' and doesn't send any %Stop packets, but gdb seems to expect the new STOP-REPLY packets: => m375fad65cb,1 <= 48 => m375fad65cb,1 <= 48 => vStopped And what should I do in this case??? Probably, this vStopped pairs the _second_ notification above. But gdbstub has already acked this notification during the previous vStopped sequence. E01? This seems to confuse gdb. >From gdb.info `vStopped' In non-stop mode (*note Remote Non-Stop::), acknowledge a previous stop reply and prompt for the stub to report another one. Reply: `Any stop packet' if there is another unreported stop event (*note Stop Reply Packets::) `OK' if there are no unreported stop events So I am sending 'OK' because there are no unreported stop events. But this seems to confuse gdb, it thinks this this 'OK' acks the second notification, <= OK => Hgp5ee3.5ee3 <= OK => g <= fefdffffffffffff0000000000000000ffffffffffffffff02000000000... => m33af60e57d,1 <= 48 => m33af60e57d,1 <= 48 => Hgp5ee3.5ee6 <= OK => g <= 00feffffffffffffa066d65f37000000ffffffffffffffff02000000000... => m375faf21ce,1 <= 89 => m375faf21ce,1 <= 89 => Hgp5ee3.5ee3 <= OK => g <= fefdffffffffffff0000000000000000ffffffffffffffff02000000000... => m33af60e57d,1 <= 48 => m33af60e57d,1 <= 48 => Hgp5ee3.5ee5 <= OK => g <= 00feffffffffffffa066d65f37000000ffffffffffffffff02000000000... => m375faf21ce,1 <= 89 => m375faf21ce,1 <= 89 => Hgp5ee3.5ee3 <= OK => g <= fefdffffffffffff0000000000000000ffffffffffffffff02000000000... => m33af60e57d,1 <= 48 => m33af60e57d,1 <= 48 => Hgp5ee3.5ee4 <= OK Note: _this_ thread was reported twice via %Stop. => g <= 00feffffffffffffa066d65f37000000ffffffffffffffff00040000000... Amen, gdb crashes. Indeed, it has already looked at this thread (see another Hgp5ee3.5ee4 above). Jan, I am not sure but _IIRC_ I observed other scenarios when gdb crashes during the attach, but can't reproduce right now. ========================================================================== Now, let's talk about %Stop. I must admit, I believe the idea behind %Stop in its current state is not very good. First of all, it is not clear how this all can be implemented correctly. Forget about the multithreading, consider the simplest case: gdb tracees the single thread, this thread stops, gdbserver sends '%Stop:T00thread:pPID.PID;'. >From gdb.info: After receiving a stop reply notification, GDB shall acknowledge it by sending a `vStopped' packet (*note vStopped packet::) as a regular, synchronous request to the stub. Such acknowledgment is not required to happen immediately, as GDB is permitted to send other, unrelated packets to the stub first, which the stub should process normally. Very nice. Suppose that, before sending vStopped, gdb sends 'D;PID'. Then it sends vStopped. How should gdbstub reply? - OK seems incorrect, it acks the previous T00 but this thread/process is already detached. - E01? probably, but this is not documented and surely it is not right if we have other events to reply (say, multiple inferiors). - But, any other reply (especially if we have other stop events to reply) acks the previous T00 which is no longer true! Or, instead of detach from gdb, suppose that the the tracee changes its state by the time gdb sends vStopped in reply to %Stop. Say, it is SIGKILL'ed. There is no way to let gdb know its state was already changed. We can only ack the state which was reported previously. And there is no way to inform gdb there is nothing new and nothing to ack because the previous notification was already acked (like it happens during the crash). And probably this crash (if my understanding is correct) at least proves that the current scheme is not very convenient. I do not suggest to discuss this right now, but perhaps we can have a stateless notification? Say, just '%Stop#..' which informs gdb it has some events to get via 'vStopped'. In this case any reply to vStopped does not ack the history, but reports the new event or 'OK' if no more events. This is at least very understandable and clear. And simpler. Oleg.