From: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
To: Pedro Alves <palves@redhat.com>
Cc: gdb-patches@sourceware.org
Subject: Re: Unbreaking gdb on Solaris post-multitarget [PR 25939]
Date: Wed, 17 Jun 2020 16:45:51 +0200 [thread overview]
Message-ID: <yddsgetbvj4.fsf@CeBiTec.Uni-Bielefeld.DE> (raw)
In-Reply-To: <7fb790ae-61a9-a6a3-3b87-74fcac400664@redhat.com> (Pedro Alves's message of "Tue, 16 Jun 2020 20:16:38 +0100")
Hi Pedro,
> On 6/16/20 3:21 PM, Rainer Orth wrote:
>> Some time ago, when testing gdb master on Solaris again after several
>> months, I discovered that gdb couldn't execute even a trivial program
>> anymore. This had gone unnoticed by the Solaris buildbots since the
>> code continued to compile just fine. Those bots are build-only since
>> many tests (especially thread tests) are either flaky or time out.
>>
>> A reghunt identified the multi-target merge as the culprit.
>
> I'm sorry about that.
no worries: the Solaris port had been in a relatively bad shape even
before, so maybe this will allow to get to the bottom of things and fix
them.
>> I've managed to get a bit further with the following patch which is
>> intended to push the procfs target first:
>
> That patch looks good to me.
Thanks.
>> However, while I now get over the initial assertion failure, I run
>> instead into
>>
>> procfs: couldn't find pid 0 in procinfo list.
>> procfs: init_inferior, open_proc_files line 2878, /proc/6031: No such file or directory.
>>
>> When I break in procfs.c (procfs_init_inferior), I can see that
>> create_procinfo succeeds. However, looking at the process tree at this
>> point, I see that the debuggee is still marked as defunct
>>
>> 18377 /vol/gcc/bin/gdb -i=mi /vol/gnu/obj/gdb/gdb/reghunt/no-r
>> 18379 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb
>> 18382 <defunct>
>>
>> so open_procinfo_files fails because /proc/<pid> only contains psinfo
>> and usage, but no ctl file yet.
>>
>> I tried to do the same with a version of gdb from immediately before the
>> multi-target merge: while that can run a test program interactively just
>> fine,
>
> It's not clear to me whether you're saying that a version from before
> the multi-target changes can run a test program fine due to not needing
> the push_target fix, or whether the multi-target patchset itself caused
> this second issue you're observing even when debugging a simple hello
> program.
I've experimented a bit more yesterday. Immediately before the
multi-target patch, I have:
$ cat top-gdb.gdb
file ./gdb
run -q -D data-directory -x bottom-gdb.gdb
$ cat bottom-gdb.gdb
file ./hello
b main
run
$ gdb-9 -q -x top-gdb.gdb
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
[New LWP 3 ]
[New LWP 4 ]
[New LWP 5 ]
[New LWP 6 ]
[New LWP 7 ]
[New LWP 8 ]
[New LWP 9 ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[Switching to Thread 1 (LWP 1)]
Thread 2 hit Breakpoint 1, main () at hello.c:6
6 printf ("Hello world\n");
At that point the process hierarchy is as expected:
22745 gdb-9 -q -x top-gdb.gdb
22761 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q
22768 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/hell
With the multi-target merge, my push_target and the worker-threads
disabled (more below), I get instead
$ gdb -q -x ~/top-gdb.gdb
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: couldn't find pid 0 in procinfo list.
and this process tree:
23011 gdb-9 -q -x top-gdb.gdb
23012 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q
23013 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/hell
However, if I add
b find_procinfo_or_die
to investigate the above error ("couldn't find pid 0), with the mt patch
there's
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
Breakpoint 3 at 0x1afc288: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c, line 327.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: init_inferior, open_proc_files line 2879, /proc/23022: No such file or directory.
[Switching to Thread 1 (LWP 1)]
Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23022, tid=0)
at /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c:327
327 procinfo *pi = find_procinfo (pid, tid);
which is no wonder given the child process is marked as defunct, so its
/proc files cannot be opened:
23020 gdb-9 -q -x top-gdb.gdb
23021 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q
23022 <defunct>
However, when I try the same in the pre-mt-patch gdb:
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201.
Breakpoint 3 at 0x1ae7e26: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c, line 325.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
[New LWP 3 ]
[New LWP 4 ]
[New LWP 5 ]
[New LWP 6 ]
[New LWP 7 ]
[New LWP 8 ]
[New LWP 9 ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: init_inferior, open_proc_files line 2870, /proc/23028: No such file or directory.
[New Thread 2 ]
[New Thread 3 ]
[New Thread 4 ]
[New Thread 5 ]
[New Thread 6 ]
[New Thread 7 ]
[New Thread 8 ]
[New Thread 9 ]
[Switching to Thread 1 (LWP 1)]
Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23028, tid=0) at /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c:325
325 procinfo *pi = find_procinfo (pid, tid);
I get the same error and the same defunct process:
23026 gdb-9 -q -x top-gdb.gdb
23027 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q
23028 <defunct>
This obviously makes debugging extra hard ;-( However, this error isn't
entirely new: when running the gdb testsuite before the mt merge, I get
several variations of this error
$ grep -a "couldn't find pid" gdb.log |sort|uniq -c
2 Error in re-setting breakpoint 2: procfs: couldn't find pid 0 in procinfo list.
2 Error in re-setting breakpoint 5: procfs: couldn't find pid 0 in procinfo list.
99 procfs: couldn't find pid -1 in procinfo list.
22 procfs: couldn't find pid 0 in procinfo list.
5 procfs: couldn't find pid 21415 in procinfo list.
5 procfs: couldn't find pid 21618 in procinfo list.
10 procfs: couldn't find pid 22032 in procinfo list.
5 procfs: couldn't find pid 22457 in procinfo list.
5 procfs: couldn't find pid 22678 in procinfo list.
10 procfs: couldn't find pid 22985 in procinfo list.
> running that gdb under gdb itself most often leads to the same
>> error. This very much seems like a race condition to me, but at the
>> moment I'm pretty much at a loss how to investigate this further.
>
> Could this be a race somehow more exposed now due to GDB now spawning worker
> threads? What happens if you debug a GDB that doesn't spawn worker
> threads? Like:
>
> ./gdb -D ./data-directory --args ./gdb -ex "maint set worker-threads 0"
This doesn't work because master gdb cannot debug anything, without or
with the push_target fix.
When instead I use a gdb 9.1 as top gdb, I get
$ gdb-9 -q --args ./gdb -D data-directory -ex "maint set worker-threads 0"
Reading symbols from ./gdb...
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
(top-gdb) run
Starting program: /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb can't handle command-line argument containing whitespace
When instead I use
$ cat top-gdb-mt.gdb
file ./gdb-mt
run -q -D data-directory -x bottom-gdb-mt.gdb
$ cat bottom-gdb-mt.gdb
maint set worker-threads 0
file ./hello
b main
run
$ gdb-9 -q -x top-gdb-mt.gdb
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
[New LWP 3 ]
[New LWP 4 ]
[New LWP 5 ]
[New LWP 6 ]
[New LWP 7 ]
[New LWP 8 ]
[New LWP 9 ]
[LWP 8 exited]
[New LWP 8 ]
[LWP 6 exited]
[New LWP 6 ]
[LWP 9 exited]
[New LWP 9 ]
[LWP 5 exited]
[New LWP 5 ]
[LWP 7 exited]
[New LWP 7 ]
[LWP 2 exited]
[New LWP 2 ]
[LWP 3 exited]
[New LWP 3 ]
[LWP 4 exited]
[New LWP 4 ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb-mt.gdb:4: Error in sourced command file:
procfs: couldn't find pid 0 in procinfo list.
> Does that problem trigger as often that way?
The failure is still reproducible that way, but even more verbose
(imagine that on that 160-core system I spoke of ;-)
To avoid that for the moment, I've changed n_worker_threads to 0 for now.
> Or, what happens if you use master GDB with your push_target fix
> to debug an older GDB?
Master GDB cannot debug anything, unfortunately.
Rainer
--
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
next prev parent reply other threads:[~2020-06-17 14:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-16 14:21 Rainer Orth
2020-06-16 19:16 ` Pedro Alves
2020-06-17 14:45 ` Rainer Orth [this message]
2020-06-18 14:55 ` Pedro Alves
2020-06-18 15:51 ` Pedro Alves
2020-06-19 12:36 ` Rainer Orth
2020-06-19 13:55 ` Pedro Alves
2020-06-21 16:37 ` [COMMITTED PATCH][PR gdb/25939] Move push_target call earlier in procfs.c Rainer Orth
2020-06-22 10:19 ` Pedro Alves
2020-06-17 15:43 ` Unbreaking gdb on Solaris post-multitarget [PR 25939] Tom Tromey
2020-06-17 17:07 ` Rainer Orth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yddsgetbvj4.fsf@CeBiTec.Uni-Bielefeld.DE \
--to=ro@cebitec.uni-bielefeld.de \
--cc=gdb-patches@sourceware.org \
--cc=palves@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).