public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
To: Pedro Alves <palves@redhat.com>
Cc: gdb-patches@sourceware.org
Subject: Re: Unbreaking gdb on Solaris post-multitarget [PR 25939]
Date: Wed, 17 Jun 2020 16:45:51 +0200	[thread overview]
Message-ID: <yddsgetbvj4.fsf@CeBiTec.Uni-Bielefeld.DE> (raw)
In-Reply-To: <7fb790ae-61a9-a6a3-3b87-74fcac400664@redhat.com> (Pedro Alves's message of "Tue, 16 Jun 2020 20:16:38 +0100")

Hi Pedro,

> On 6/16/20 3:21 PM, Rainer Orth wrote:
>> Some time ago, when testing gdb master on Solaris again after several
>> months, I discovered that gdb couldn't execute even a trivial program
>> anymore.  This had gone unnoticed by the Solaris buildbots since the
>> code continued to compile just fine.  Those bots are build-only since
>> many tests (especially thread tests) are either flaky or time out.
>> 
>> A reghunt identified the multi-target merge as the culprit.
>
> I'm sorry about that.

no worries: the Solaris port had been in a relatively bad shape even
before, so maybe this will allow to get to the bottom of things and fix
them.

>> I've managed to get a bit further with the following patch which is
>> intended to push the procfs target first:
>
> That patch looks good to me.

Thanks.

>> However, while I now get over the initial assertion failure, I run
>> instead into
>> 
>> procfs: couldn't find pid 0 in procinfo list.
>> procfs: init_inferior, open_proc_files line 2878, /proc/6031: No such file or directory.
>> 
>> When I break in procfs.c (procfs_init_inferior), I can see that
>> create_procinfo succeeds.  However, looking at the process tree at this
>> point, I see that the debuggee is still marked as defunct
>> 
>>                   18377 /vol/gcc/bin/gdb -i=mi /vol/gnu/obj/gdb/gdb/reghunt/no-r
>>                     18379 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb 
>>                       18382 <defunct>
>> 
>> so open_procinfo_files fails because /proc/<pid> only contains psinfo
>> and usage, but no ctl file yet.
>> 
>> I tried to do the same with a version of gdb from immediately before the
>> multi-target merge: while that can run a test program interactively just
>> fine, 
>
> It's not clear to me whether you're saying that a version from before
> the multi-target changes can run a test program fine due to not needing
> the push_target fix, or whether the multi-target patchset itself caused
> this second issue you're observing even when debugging a simple hello
> program.

I've experimented a bit more yesterday.  Immediately before the
multi-target patch, I have:

$ cat top-gdb.gdb
file ./gdb
run -q -D data-directory -x bottom-gdb.gdb
$ cat bottom-gdb.gdb
file ./hello
b main
run
$ gdb-9 -q -x top-gdb.gdb
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]
[New LWP    6        ]
[New LWP    7        ]
[New LWP    8        ]
[New LWP    9        ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[Switching to Thread 1 (LWP 1)]

Thread 2 hit Breakpoint 1, main () at hello.c:6
6	  printf ("Hello world\n");

At that point the process hierarchy is as expected:

                22745 gdb-9 -q -x top-gdb.gdb
                  22761 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q
                    22768 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/hell

With the multi-target merge, my push_target and the worker-threads
disabled (more below), I get instead

$ gdb -q -x ~/top-gdb.gdb 
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: couldn't find pid 0 in procinfo list.

and this process tree:

                23011 gdb-9 -q -x top-gdb.gdb
                  23012 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q
                    23013 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/hell

However, if I add

b find_procinfo_or_die

to investigate the above error ("couldn't find pid 0), with the mt patch
there's

Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
Breakpoint 3 at 0x1afc288: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c, line 327.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: init_inferior, open_proc_files line 2879, /proc/23022: No such file or directory.
[Switching to Thread 1 (LWP 1)]

Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23022, tid=0)
    at /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c:327
327	  procinfo *pi = find_procinfo (pid, tid);

which is no wonder given the child process is marked as defunct, so its
/proc files cannot be opened:

                23020 gdb-9 -q -x top-gdb.gdb
                  23021 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q
                    23022 <defunct>

However, when I try the same in the pre-mt-patch gdb:

Setting up the environment for debugging gdb.
Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201.
Breakpoint 3 at 0x1ae7e26: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c, line 325.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]
[New LWP    6        ]
[New LWP    7        ]
[New LWP    8        ]
[New LWP    9        ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb.gdb:3: Error in sourced command file:
procfs: init_inferior, open_proc_files line 2870, /proc/23028: No such file or directory.
[New Thread 2        ]
[New Thread 3        ]
[New Thread 4        ]
[New Thread 5        ]
[New Thread 6        ]
[New Thread 7        ]
[New Thread 8        ]
[New Thread 9        ]
[Switching to Thread 1 (LWP 1)]

Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23028, tid=0) at /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c:325
325	  procinfo *pi = find_procinfo (pid, tid);

I get the same error and the same defunct process:

                23026 gdb-9 -q -x top-gdb.gdb
                  23027 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q
                    23028 <defunct>

This obviously makes debugging extra hard ;-(  However, this error isn't
entirely new: when running the gdb testsuite before the mt merge, I get
several variations of this error

$ grep -a "couldn't find pid" gdb.log |sort|uniq -c
      2 Error in re-setting breakpoint 2: procfs: couldn't find pid 0 in procinfo list.
      2 Error in re-setting breakpoint 5: procfs: couldn't find pid 0 in procinfo list.
     99 procfs: couldn't find pid -1 in procinfo list.
     22 procfs: couldn't find pid 0 in procinfo list.
      5 procfs: couldn't find pid 21415 in procinfo list.
      5 procfs: couldn't find pid 21618 in procinfo list.
     10 procfs: couldn't find pid 22032 in procinfo list.
      5 procfs: couldn't find pid 22457 in procinfo list.
      5 procfs: couldn't find pid 22678 in procinfo list.
     10 procfs: couldn't find pid 22985 in procinfo list.

> running that gdb under gdb itself most often leads to the same
>> error.  This very much seems like a race condition to me, but at the
>> moment I'm pretty much at a loss how to investigate this further.
>
> Could this be a race somehow more exposed now due to GDB now spawning worker
> threads?  What happens if you debug a GDB that doesn't spawn worker
> threads?  Like:
>
> ./gdb -D ./data-directory --args ./gdb -ex "maint set worker-threads 0"

This doesn't work because master gdb cannot debug anything, without or
with the push_target fix.

When instead I use a gdb 9.1 as top gdb, I get

$ gdb-9 -q --args ./gdb -D data-directory -ex "maint set worker-threads 0"
Reading symbols from ./gdb...
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54.
Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201.
(top-gdb) run
Starting program: /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb can't handle command-line argument containing whitespace

When instead I use

$ cat top-gdb-mt.gdb
file ./gdb-mt
run -q -D data-directory -x bottom-gdb-mt.gdb
$ cat bottom-gdb-mt.gdb
maint set worker-threads 0
file ./hello
b main
run
$ gdb-9 -q -x top-gdb-mt.gdb
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]
[New LWP    6        ]
[New LWP    7        ]
[New LWP    8        ]
[New LWP    9        ]
[LWP    8         exited]
[New LWP    8        ]
[LWP    6         exited]
[New LWP    6        ]
[LWP    9         exited]
[New LWP    9        ]
[LWP    5         exited]
[New LWP    5        ]
[LWP    7         exited]
[New LWP    7        ]
[LWP    2         exited]
[New LWP    2        ]
[LWP    3         exited]
[New LWP    3        ]
[LWP    4         exited]
[New LWP    4        ]
Breakpoint 1 at 0x401036: file hello.c, line 6.
bottom-gdb-mt.gdb:4: Error in sourced command file:
procfs: couldn't find pid 0 in procinfo list.

> Does that problem trigger as often that way?

The failure is still reproducible that way, but even more verbose
(imagine that on that 160-core system I spoke of ;-)

To avoid that for the moment, I've changed n_worker_threads to 0 for now.

> Or, what happens if you use master GDB with your push_target fix
> to debug an older GDB?

Master GDB cannot debug anything, unfortunately.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

  reply	other threads:[~2020-06-17 14:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16 14:21 Rainer Orth
2020-06-16 19:16 ` Pedro Alves
2020-06-17 14:45   ` Rainer Orth [this message]
2020-06-18 14:55     ` Pedro Alves
2020-06-18 15:51       ` Pedro Alves
2020-06-19 12:36         ` Rainer Orth
2020-06-19 13:55           ` Pedro Alves
2020-06-21 16:37             ` [COMMITTED PATCH][PR gdb/25939] Move push_target call earlier in procfs.c Rainer Orth
2020-06-22 10:19               ` Pedro Alves
2020-06-17 15:43   ` Unbreaking gdb on Solaris post-multitarget [PR 25939] Tom Tromey
2020-06-17 17:07     ` Rainer Orth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yddsgetbvj4.fsf@CeBiTec.Uni-Bielefeld.DE \
    --to=ro@cebitec.uni-bielefeld.de \
    --cc=gdb-patches@sourceware.org \
    --cc=palves@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).