From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-2095-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 18712 invoked by alias); 3 Aug 2010 12:27:20 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 18689 invoked by uid 22791); 3 Aug 2010 12:27:13 -0000
X-SWARE-Spam-Status: No, hits=-6.6 required=5.0
	tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Date: Tue, 03 Aug 2010 12:27:00 -0000
From: Oleg Nesterov <oleg@redhat.com>
To: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Roland McGrath <roland@redhat.com>, archer@sourceware.org,
        utrace-devel@redhat.com
Subject: Q: %Stop && gdb crash
Message-ID: <20100803122434.GA32698@redhat.com>
References: <20100716205147.GA26313@redhat.com> <20100721170400.GA30978@redhat.com> <20100721204203.D040C400B6@magilla.sf.frob.com> <20100723173134.GA29717@redhat.com> <20100726142759.GA17171@redhat.com> <20100728181702.GA26678@redhat.com> <20100802235358.GA9720@host1.dyn.jankratochvil.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100802235358.GA9720@host1.dyn.jankratochvil.net>
User-Agent: Mutt/1.5.18 (2008-05-17)
X-SW-Source: 2010-q3/txt/msg00078.txt.bz2

On 08/03, Jan Kratochvil wrote:
>
> On Wed, 28 Jul 2010 20:17:02 +0200, Oleg Nesterov wrote:
>
> > Btw, gdb crashes very often right after
> >
> > 	(gdb) set target-async on
> > 	(gdb) set non-stop
> > 	(gdb) file mt-program
> > 	(gdb) target extended-remote :port
> > 	(gdb) attach its_pid
> >
> > I didn't even try to investigate (this doesn't happen when
> > it works with the real gdbserver). Just retry, gdb is buggy.
                                                   ^^^^^^^^^^^^
Yes, I still think gdb is wrong, but please correct me.

> Trying it with both /bin/sleep and a threaded testcase and I never got a crash
> (kernel-2.6.33.6-147.fc13.x86_64 as both host and KVM guest OS).

To clarify, let me repeat: I never saw such a crash with the real
gdbserver, but this often happens in my testing.


I think I understand what happens. And this leads to the question
about the %Stop notifications which I was going to delay, see below.

I just reproduced the crash. I entered the following commands via
CLI interface:

	(gdb) set target-async on
	(gdb) set non-stop
	(gdb) target extended-remote :2000
	(gdb) file mt

Everything is OK so far. "mt" is not interesting, just the simple
application with 4 sleeping threads.

Then gdb crashes during attach:

	(gdb) attach 24291
	Attached to process 24291
	[New Thread 24291.24291]
	[New Thread 24291.24292]
	[New Thread 24291.24293]
	[New Thread 24291.24294]
	Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
	Loaded symbols for /lib64/libpthread.so.0
	Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
	Loaded symbols for /lib64/libc.so.6
	Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
	Loaded symbols for /lib64/ld-linux-x86-64.so.2
	0x000000375faf21ce in __lll_lock_wait_private () from /lib64/libc.so.6
	(gdb)
	[Thread 24291.24293] #3 stopped.
	0x000000375faf21ce in __lll_lock_wait_private () from /lib64/libc.so.6

	[Thread 24291.24292] #2 stopped.
	0x000000375fad65cb in read () from /lib64/libc.so.6

	[Thread 24291.24291] #1 stopped.
	0x00000033af60e57d in pause () from /lib64/libpthread.so.0
	inline-frame.c:335: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.
	A problem internal to GDB has been detected,

And I think this is because of %Stop issues.

>From gdb.info

	Because the
	notification mechanism is unreliable, the stub is permitted to resend a
	stop reply notification if it believes GDB may not have received it.
	GDB ignores additional stop reply notifications received before it has
	finished processing a previous notification and the stub has completed
	sending any queued stop events.

So I assumed it is always safe to resend the notification unless gdb already
sent vStopped. Since it is not clear to me when it makes sense to resend it,
currently gdbstub does re-send every time /proc/ugdb reports the new event
(T00 in this case). I agree this is not optimal, but this looks correct to me.

However, gdb.info also states:

	Only one stop reply notification at a time may be pending; if
	additional stop events occur before GDB has acknowledged the previous
	notification, they must be queued by the stub for later synchronous
	transmission in response to `vStopped' packets from GDB.

That is why gdbstub re-sends the same notification, until it gets vStopped.

Now let's look into the log:

	=> vAttach;5ee3
	<= OK
	=> qfThreadInfo
	<= mp5ee3.5ee3,p5ee3.5ee4,p5ee3.5ee5,p5ee3.5ee6
	=> qsThreadInfo
	<= l
	=> Hgp5ee3.5ee3
	<= OK
	=> vCont?
	<= vCont;t;c;C;s;S
	=> vCont;t:p5ee3.-1
	<= OK

Note: gdbstub reports OK before any thread actually stops. I believe
this is correct from the remote protocol pov, and this is what we want.

	<= Stop:T00thread:p5ee3.5ee4;

Some thread actually stops, we sent the notification.

	<= Stop:T00thread:p5ee3.5ee4;

Another threads stops, gdbstub resends the same notification according
to the docs above (or according to my understanding).

Note: this doesn't happen _every time_. In the more likely case all
threads are already stopped when ->poll() succeeds. But sometimes
some thread stops a little bit later.

Once again, please note that both notifications are the same thing,
but I guess gdb doesn't understand this, see below.

Then,

	=> vStopped
	<= T00thread:p5ee3.5ee3;
	=> vStopped
	<= T00thread:p5ee3.5ee6;
	=> vStopped
	<= T00thread:p5ee3.5ee5;
	=> vStopped
	<= OK
	=> Hgp5ee3.5ee4
	<= OK
	=> g
	<= 00feffffffffffffa066d65f37000000ffffffffffffffff00040000000...

	[...snip a lot of $m packets ]

everything is fine so far. Then,

	=> vCont;t:p5ee3.-1
	<= OK

Well. I hope this 'OK' without the subsequent notifications matches
the documentation:

	vCont[;ACTION[:THREAD-ID]]...

	...
	The `t' action is only relevant in non-stop mode
	...
	A stop reply should be generated for any affected thread
	not already stopped.

IIUC, "already stopped" means "already reported as stopped to gdb".
So gdbstub replies 'OK' and doesn't send any %Stop packets, but gdb
seems to expect the new STOP-REPLY packets:

	=> m375fad65cb,1
	<= 48
	=> m375fad65cb,1
	<= 48
	=> vStopped

And what should I do in this case??? Probably, this vStopped pairs the
_second_ notification above. But gdbstub has already acked this notification
during the previous vStopped sequence. E01? This seems to confuse gdb.

>From gdb.info

	`vStopped'
	     In non-stop mode (*note Remote Non-Stop::), acknowledge a previous
	     stop reply and prompt for the stub to report another one.

	     Reply:
	    `Any stop packet'
		  if there is another unreported stop event (*note Stop Reply 
		  Packets::)

	    `OK'
		  if there are no unreported stop events

So I am sending 'OK' because there are no unreported stop events. But this
seems to confuse gdb, it thinks this this 'OK' acks the second notification,

	<= OK
	=> Hgp5ee3.5ee3
	<= OK
	=> g
	<= fefdffffffffffff0000000000000000ffffffffffffffff02000000000...
	=> m33af60e57d,1
	<= 48
	=> m33af60e57d,1
	<= 48
	=> Hgp5ee3.5ee6
	<= OK
	=> g
	<= 00feffffffffffffa066d65f37000000ffffffffffffffff02000000000...
	=> m375faf21ce,1
	<= 89
	=> m375faf21ce,1
	<= 89
	=> Hgp5ee3.5ee3
	<= OK
	=> g
	<= fefdffffffffffff0000000000000000ffffffffffffffff02000000000...
	=> m33af60e57d,1
	<= 48
	=> m33af60e57d,1
	<= 48
	=> Hgp5ee3.5ee5
	<= OK
	=> g
	<= 00feffffffffffffa066d65f37000000ffffffffffffffff02000000000...
	=> m375faf21ce,1
	<= 89
	=> m375faf21ce,1
	<= 89
	=> Hgp5ee3.5ee3
	<= OK
	=> g
	<= fefdffffffffffff0000000000000000ffffffffffffffff02000000000...
	=> m33af60e57d,1
	<= 48
	=> m33af60e57d,1
	<= 48
	=> Hgp5ee3.5ee4
	<= OK

Note: _this_ thread was reported twice via %Stop.

	=> g
	<= 00feffffffffffffa066d65f37000000ffffffffffffffff00040000000...

Amen, gdb crashes. Indeed, it has already looked at this thread (see
another Hgp5ee3.5ee4 above).

Jan, I am not sure but _IIRC_ I observed other scenarios when gdb
crashes during the attach, but can't reproduce right now.

==========================================================================
Now, let's talk about %Stop.

I must admit, I believe the idea behind %Stop in its current state
is not very good. First of all, it is not clear how this all can
be implemented correctly. Forget about the multithreading, consider
the simplest case: gdb tracees the single thread, this thread stops,
gdbserver sends '%Stop:T00thread:pPID.PID;'.

>From gdb.info:

	After receiving a stop reply notification, GDB shall acknowledge it
	by sending a `vStopped' packet (*note vStopped packet::) as a regular,
	synchronous request to the stub.  Such acknowledgment is not required
	to happen immediately, as GDB is permitted to send other, unrelated
	packets to the stub first, which the stub should process normally.

Very nice. Suppose that, before sending vStopped, gdb sends 'D;PID'.
Then it sends vStopped. How should gdbstub reply?

	- OK seems incorrect, it acks the previous T00 but this
	  thread/process is already detached.

	- E01? probably, but this is not documented and surely
	  it is not right if we have other events to reply
	  (say, multiple inferiors).

	- But, any other reply (especially if we have other stop
	  events to reply) acks the previous T00 which is no longer
	  true!

Or, instead of detach from gdb, suppose that the the tracee changes
its state by the time gdb sends vStopped in reply to %Stop. Say, it
is SIGKILL'ed. There is no way to let gdb know its state was already
changed. We can only ack the state which was reported previously.
And there is no way to inform gdb there is nothing new and nothing
to ack because the previous notification was already acked (like
it happens during the crash).

And probably this crash (if my understanding is correct) at least
proves that the current scheme is not very convenient.


I do not suggest to discuss this right now, but perhaps we can have
a stateless notification? Say, just '%Stop#..' which informs gdb
it has some events to get via 'vStopped'. In this case any reply
to vStopped does not ack the history, but reports the new event or
'OK' if no more events. This is at least very understandable and
clear. And simpler.

Oleg.