From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-2070-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 9556 invoked by alias); 21 Jul 2010 08:32:47 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 9546 invoked by uid 22791); 21 Jul 2010 08:32:46 -0000
X-SWARE-Spam-Status: No, hits=-6.6 required=5.0
	tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Date: Wed, 21 Jul 2010 08:32:00 -0000
From: Oleg Nesterov <oleg@redhat.com>
To: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>, archer@sourceware.org
Subject: Re: Q: mutlithreaded tracees && clone/exit
Message-ID: <20100721083028.GB5740@redhat.com>
References: <20100718174851.GA15528@redhat.com> <20100716205147.GA26313@redhat.com> <20100719160127.GA13331@host1.dyn.jankratochvil.net> <20100720131615.GA17450@redhat.com> <20100720194119.C0E3C40162@magilla.sf.frob.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100720194119.C0E3C40162@magilla.sf.frob.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
X-SW-Source: 2010-q3/txt/msg00053.txt.bz2

On 07/20, Roland McGrath wrote:
>
> > Probably this is fine for gdb. But ugdb was started to prototype the
> > new general purpose API. Say, vAttach attaches the whole thread group,
> > there is no way to debug a single thread. Not good in general. The same
> > for D command and for W/X notifications from gdbserver.
>
> It seems fine and normal for whole process to be the granularity of
> attaching.  You need to be able to control the individual threads, of
> course.  But it doesn't really make a lot of sense to "debug" one thread
> and not another in the same process.

I disagree. But currently this is off-topic.

> > However, when this thread exits, gdbserver sends nothing and gdb
> > continues to wait. For what? Another (main) thead is TASK_TRACED,
> > it can do nothing unless it is SIGKILLED.
>
> Yes, it seems like gdb is confusing itself here.
> Perhaps it is not confused that way when in non-stop mode.

No, I did this testing in non-stop mode. With or without target-async.

Just in case, more info. So, gdb hangs when the sub-thread exits
(to remind, gdbserver sends nothing).

If I press ^C, gdb sends "vCont;t:pTGID.PID" and gdbserver replies
"OK". Now  this looks like a bug in gdbserver. This thread no longer
exists, it was already reaped.

So, gdb hangs again after ^C waiting for gdbserver which does nothing.


This is what gdbserver does when the sub-thread exits:

	select(5, [3 4], [], [3 4], NULL)       = ? ERESTARTNOHAND (To be restarted)
	--- SIGCHLD (Child exited) @ 0 (0) ---

	(the tracee exits)

	read(3, 0x7fffc13431bf, 1)              = -1 EAGAIN (Resource temporarily unavailable)
	write(5, "+", 1)                        = 1
	rt_sigreturn(0x5)                       = -1 EINTR (Interrupted system call)
	select(5, [3 4], [], [3 4], NULL)       = 1 (in [3])
	read(3, "+", 1)                         = 1
	read(3, 0x7fffc13434bf, 1)              = -1 EAGAIN (Resource temporarily unavailable)
	rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
	wait4(-1, 0x7fffc134356c, WNOHANG, NULL) = 0
	wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|__WCLONE, NULL) = 6538

	(this means release_task(), this thread doesn't exist any longer)

	rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
	rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
	wait4(-1, 0x7fffc134356c, WNOHANG, NULL) = 0
	wait4(-1, 0x7fffc134356c, WNOHANG|__WCLONE, NULL) = -1 ECHILD (No child processes)
	rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
	select(5, [3 4], [], [3 4], NULL <unfinished ...>

So, it sends nothing to gdb. When I press ^C, gdb sends vCont and:

	select(5, [3 4], [], [3 4], NULL)       = 1 (in [4])
	--- SIGIO (I/O possible) @ 0 (0) ---
	read(4, "$vCont;t:p1989.198a#6f", 8192) = 22
	write(4, "$OK#9a", 6)                   = 6
	select(5, [3 4], [], [3 4], NULL <unfinished ...>

gdbserver sends the bogus "OK".


The bug is not "fatal", if I press ^C again gdb sends T, gets the
correct "E01", and detects the fact it has exited. Still this looks
like a obvious bug.

Oleg.