From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11509 invoked by alias); 5 Oct 2010 17:31:23 -0000 Mailing-List: contact archer-help@sourceware.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Subscribe: List-Id: Received: (qmail 11489 invoked by uid 22791); 5 Oct 2010 17:31:21 -0000 X-SWARE-Spam-Status: No, hits=-6.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_LOTS_OF_MONEY,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Date: Tue, 05 Oct 2010 17:31:00 -0000 From: Oleg Nesterov To: archer@sourceware.org, utrace-devel@redhat.com Subject: BUG: gdb && notification packets (Was: gdbstub initial code, v12) Message-ID: <20101005172729.GA27882@redhat.com> References: <20100930182320.GA17475@redhat.com> <20101004181053.GA30217@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101004181053.GA30217@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-SW-Source: 2010-q4/txt/msg00002.txt.bz2 On 10/04, Oleg Nesterov wrote: > > While trying to understand what does this mean, I hit another bug > in ugdb. No, /usr/bin/gdb is buggy. > A multithreaded tracee can "hang" if gdb simulates > watchpoints with single-steps + mem-fetch/compare. It doesn't, but gdb "forgets" about the pending "Stop:" notification, see below. > This never > happens if the tracee is single-threaded. This is clear. And this doesn't happen with the real gdbbserver. Probably timing issues, with ugdb the tracee can report itself as stopped much faster (not because ugdb is faster, but because it sends the notification before the tracee actually stops). Consider this simplified artifitial test-case: #include #include #include struct { char x[128]; } VAR; void *tfunc(void *arg) { for (;;) ; } int main(void) { pthread_t thr; printf("pid: %d\n", getpid()); pthread_create(&thr, NULL, tfunc, NULL); tfunc(NULL); return 0; } It just creates 2 threads, each threads spins in the trivial endless loop. VAR is the dummy variable for "watch", it is never changed. It should be big enough to ensure can_use_hardware_watchpoint->default_region_ok_for_hw_watchpoint() returns 0 and gdb falls back to simulating. (gdb) set target-async on (gdb) set non-stop (gdb) target extended-remote :2000 (gdb) attach 1404 (gdb) watch VAR (gdb) c -a After that, sooner or later the tracee "hangs". gdb sleeps in poll(), both traced threads sleep after reporting SIGTRAP (single-step). The more or less "typical" transcript is: [... snip ...] => s <= OK <= %Stop:T05thread:p57c.57d; => vStopped <= T05thread:p57c.57c; => vStopped <= OK => g <= 50994242000000000000000000000000761f0a16677f000000000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => g <= 50994242000000000000000000000000761f0a16677f000000000000000000... => s <= OK <= %Stop:T05thread:p57c.57d; => Hgp57c.57c <= OK => g <= 000000000000000030064000000000008162e115677f000000000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => Hcp57c.57c <= OK => s#73$vStopped <= OK#9a$OK <= %Stop:T05thread:p57c.57c; => Hgp57c.57d#ec$g <= OK#9a$50994242000000000000000000000000761f0a16677f000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => g <= 50994242000000000000000000000000761f0a16677f000000000000000000... => Hcp57c.57d <= OK => s#73$vStopped <= OK#9a$OK <= %Stop:T05thread:p57c.57d; => Hgp57c.57c <= OK => g <= 000000000000000030064000000000008162e115677f000000000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => Hcp57c.57c <= OK => s#73$vStopped <= OK#9a$OK Everything is fine so far. And this is how it works with the real gdbserver: gdb always issues vStopped until it gets the final "OK". Like it should, I guess. So. At this point gdb knows that 57d is stopped, and it sends '$s' to 57c. <= %Stop:T05thread:p57c.57c; 57c reports the next stop, but gdb temporary "ignores" this notification and plays with 57d. This is fine. => Hgp57c.57d#ec$g <= OK#9a$50994242000000000000000000000000761f0a16677f000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => g <= 50994242000000000000000000000000761f0a16677f000000000000000000... => Hcp57c.57d <= OK => s#73$vStopped Finally it does vStopped. But, by this time 57d is stopped again (note that gdb sends '$s' before vStopped). <= OK#9a$T05thread:p57c.57d; ugdb reports 57d as stopped. _If_ gdb sent vStopped right now, he would get "OK" in return. But it doesn't. Again, this is not the bug yet. => Hgp57c.57c <= OK => g <= 000000000000000030064000000000008162e115677f000000000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => Hcp57c.57c <= OK => s <= OK Now, 57c stops. But it can't report this stop. We are "inside" the "vStopped" sequence, ugdb waits for gdb which should complete it by sending the next vStopped packet which should find that 57c is stopped. >From the documentation: If the stub receives a `vStopped' packet and there are no additional stop events to report, the stub shall return an `OK' response. At this point, if further stop events occur, the stub shall send a new stop reply notification, IOW, 57c should not send the notification and it doesn't. But this confuses gdb. In short: gdb forgets that the last 'vStopped' didn't return 'OK' and thus we probably have more stopped tracees but waits for the next notification. ugdb doesn't send the notification because it shouldn't do this until gdb sends the "final" vStopped according to the doc above. Another similar transcript: [... snip ...] => Hcp589.589 <= OK => s <= OK#9a%Stop:T05thread:p589.589; => vStopped <= OK => g <= 0000000000000000300640000000000081621fd01f7f000000000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => s <= OK#9a%Stop:T05thread:p589.589; => vStopped <= OK => g <= 0000000000000000300640000000000081621fd01f7f000000000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => s#73$vStopped <= OK#9a$T05thread:p589.589; => Hgp589.58a#c1$g <= OK#9a$50990141000000000000000000000000761f48d01f7f000000000000... => m600a60,80 <= 00000000000000000000000000000000000000000000000000000000000000... => g <= 50990141000000000000000000000000761f48d01f7f000000000000000000... => Hcp589.58a <= OK => s <= OK Almost the same. Once again, whatever I did I failed to reproduce this hang with the real gdbserver. So I hacked gdb to send the similar sequence of packets instead, and result is the same: getpkt ("Hcp589.589"); [no ack sent] putpkt ("$OK#9a"); [noack mode] getpkt ("s"); [no ack sent] putpkt ("$OK#9a"); [noack mode] pending stop replies: 1 putpkt ("%Stop:T0506:f089f2deff7f0* ;07:f089f2deff7f0* ;10:d40540*';thread:p589.589;core:1;#94"); [notif] getpkt ("Hcp589.58a"); [no ack sent] putpkt ("$OK#9a"); [noack mode] getpkt ("s"); [no ack sent] putpkt ("$OK#9a"); [noack mode] pending stop replies: 2 getpkt ("vStopped"); [no ack sent] vStopped: acking LWP 1417.1417 putpkt ("$T0506:309101410*"00;07:309101410*"00;10:d40540*';thread:p589.58a;core:1;#24"); [noack mode] getpkt ("Hcp589.589"); [no ack sent] putpkt ("$OK#9a"); [noack mode] getpkt ("s"); [no ack sent] putpkt ("$OK#9a"); [noack mode] pending stop replies: 2 Like ugdb, gdbserver does NOT send the new notification, but waits for gdb which should continue and complete the vStopped sequence. So, I strongly believe gdb is buggy and should be fixed. Oleg.