public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/28471] New: Internal error on Assertion `pid != 0' on AIX
@ 2021-10-18 23:38 kadler at us dot ibm.com
  2021-10-19 18:15 ` [Bug gdb/28471] " simon.marchi at polymtl dot ca
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: kadler at us dot ibm.com @ 2021-10-18 23:38 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28471

            Bug ID: 28471
           Summary: Internal error on Assertion `pid != 0' on AIX
           Product: gdb
           Version: 11.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: kadler at us dot ibm.com
  Target Milestone: ---

[infrun] handle_inferior_event: status->kind = exited, status = 0
../../gdb/inferior.c:306: internal-error: inferior*
find_inferior_pid(process_stratum_target*, int): Assertion `pid != 0' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

This was with a simple "hello world" program.

The problem seems to be that target_wait is returning a ptid of pid 0. On AIX,
this seems to be implemented by aix_thread_target::wait in aix-thread.c which
calls rs6000_nat_target::wait in rs6000-nat.c.

While debugging, I noticed that waitpid would get a valid pid and then call
waitpid again, which would print out "Child process unexpectedly missing: no
such process" and return -1. I believe rs6000_nat_target::wait in rs6000-nat.c
needs similar change as commit f37e5866aa72e76f2199155fb838ffc25c78a26e, but
even after making that change I'm still getting the same error.

After making that change, rs6000_nat_target::wait is now returning the correct
pid, but aix_thread_target::wait is not. The interesting thing I noticed is
that this function *never* returns or really does anything with the ptid that
it got back. Instead it either calls pd_active or pd_update. In this case,
pd_update(0) would be called, which does *something* with the pthdebug (I'm not
too familiar with these APIs) and seems to rely on using inferior_ptid, however
with the addition of Multi-target support in
5b6d1e4fa4fc6827c7b3f0e99ff120dfa14d65d2 switch_to_inferior_no_thread is now
called. If I comment this line out, the issue goes away. Of course, the comment
preceding it might have some words with the AIX code:

  /* We know that we are looking for an event in the target of inferior
     INF, but we don't know which thread the event might come from.  As
     such we want to make sure that INFERIOR_PTID is reset so that none of
     the wait code relies on it - doing so is always a mistake.  */


Seems like a lot of the aix_thread code is a mistake :D

In any case, I'm not exactly sure how to fix the pd_activate / pd_update.
Perhaps a ptid needs to be passed in (passing in inferior_ptid in the when not
called from wait)?

 The wait code could probably also use an early return when exited like in
bsd-uthread.c and sol-thread.c:

  /* If the process is no longer alive, there's no point in figuring
     out the thread ID.  It will fail anyway.  */
  if (status->kind == TARGET_WAITKIND_SIGNALLED
      || status->kind == TARGET_WAITKIND_EXITED)
    return ptid;

Of course, then sync_threadlists wouldn't be called from wait. Does that
matter?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug gdb/28471] Internal error on Assertion `pid != 0' on AIX
  2021-10-18 23:38 [Bug gdb/28471] New: Internal error on Assertion `pid != 0' on AIX kadler at us dot ibm.com
@ 2021-10-19 18:15 ` simon.marchi at polymtl dot ca
  2021-10-19 19:17 ` kadler at us dot ibm.com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: simon.marchi at polymtl dot ca @ 2021-10-19 18:15 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28471

Simon Marchi <simon.marchi at polymtl dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |simon.marchi at polymtl dot ca

--- Comment #1 from Simon Marchi <simon.marchi at polymtl dot ca> ---
Hi,

Not too familiar with AIX, but I'll try to help.  The AIX target suffers from
bit-rot, as you found out, so needs a bit of updating.

The inferior_ptid is used as "the current thread" value.  A lot of functions in
GDB work on a ptid or thread, but don't take a ptid or thread as argument. 
Instead, they read inferior_ptid.

But as explained in the comment you quoted, target_wait doesn't work that way. 
It is used to pull an event out of the target, it doesn't operate on a specific
thread.  Clearing inferior_ptid prior to calling target_wait (instead of
leaving it set to some random value) ensures the target doesn't rely on it,
like some targets historically did.

So already, the first line of aix_thread_target::wait is suspicious:

  scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);

This saves the value of inferior_ptid and restores it at the end of the scope. 
But we know there is nothing to restore, the entry value of inferior_ptid is
meaningless.

What I understand from the current code (it would need to be confirmed by
somebody who knows AIX) is that we are dealing with something like userspace
threads.  The process layer (rs6000-nat) just knows about a single process /
thread, with ptid (pid=1234, lwp=0, tid=0).  The aix-thread target sits on top
and exposes the threads that are managed by some userspace threading library. 
Threads managed by aix-thread would therefore have ptids (pid=1234, lwp=0,
tid=1), (pid=1234, lwp=0, tid=2), and so forth.

The ptid argument passed to target_wait acts as a filter, it asks the target
"get me events only for that ptid".  So let's say the core of GDB passes
ptid=(pid=1234, lwp=0, tid=2), the aix-thread modifies that ptid to clear the
tid field (function pid_to_prc) before passing the filter to the target below,
because the tid field doesn't make sense to the target below and would confuse
it.  I think that is correct, but the subsequent line:

  inferior_ptid = ptid_t (inferior_ptid.pid ());

doesn't make sense anymore, and should be removed.

Then, here:

  if (ptid.pid () == -1)
    return ptid_t (-1);

Means: if the target didn't produce an interesting event, return early.  This
looks correct, but could be made nicer as:

  if (ptid == minus_one_ptid)
    return minus_one_ptid;

Now, if the target below did return an interesting even, we go in pd_activate
or pd_update.  These functions rely on inferior_ptid, but they shouldn't that
needs to be fixed.  Let's assume that the target below returned an event for
ptid (pid=1234, lwp=0, tid=0), I suppose that pd_activate and pd_update will do
some bookeeping and will fill in that tid field, so that
aix_thread_target::wait returns a ptid for a specific tid.  These functions
used to rely on inferior_ptid, I suppose they can just be changed to accept the
ptid (the one returned by the target below) as argument.  And do the thing that
makes sense, like:

    /* If this target is not active, return the ptid as returned by the target
below.  */
    if (!pd_active)
      return ptid_passed_as_argument;  // used to say inferior_ptid

One question I would have for an AIX expert is: is this threading model /
library still used today, or is it just some ancient stuff?  Userspace threads
are rare these days, so it's possible that this aix-thread target isn't
actually useful, and just gets in the way, and that threads are managed and
exposed by the kernel, and therefore managed by the rs6000-nat target  If so,
maybe aix-thread.c could be removed.

And one last thing: since rs6000-nat.c is AIX-specific, it should be named as
such, rs6000-aix-nat.c.  If you plan on touching that area, could you please
make this change?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug gdb/28471] Internal error on Assertion `pid != 0' on AIX
  2021-10-18 23:38 [Bug gdb/28471] New: Internal error on Assertion `pid != 0' on AIX kadler at us dot ibm.com
  2021-10-19 18:15 ` [Bug gdb/28471] " simon.marchi at polymtl dot ca
@ 2021-10-19 19:17 ` kadler at us dot ibm.com
  2022-08-05 21:26 ` kadler at us dot ibm.com
  2022-08-05 21:44 ` simark at simark dot ca
  3 siblings, 0 replies; 5+ messages in thread
From: kadler at us dot ibm.com @ 2021-10-19 19:17 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28471

--- Comment #2 from Kevin Adler <kadler at us dot ibm.com> ---
Thanks, I did make some changes last night that were along the lines you talked
about.

Firstly, functions take in a ptid_t parameter now and don't rely on
inferior_ptid.

- get_signaled_thread now returns a thread_info* and the iterate_over_threads
call from pd_update was moved there. This was only needed when pd_activate(1)
was called in pd_enable, which now calls get_signaled_thread. The
switch_to_thread call that was formerly in pd_update is now there too, since
that was the only place it was used.

- pd_update now ends after sync_threadlists.

- pd_activate no longer calls pd_update and takes no args. Callers now call
pd_update() themselves.


This seems to have fixed the issue for me. No clue whether the whole
sync_threadlists is still needed or the larger question around user-space
thread support in general.

I'll look at the other changes you mentioned: removing the scope_restorer, the
code that sets inferior_ptid, etc.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug gdb/28471] Internal error on Assertion `pid != 0' on AIX
  2021-10-18 23:38 [Bug gdb/28471] New: Internal error on Assertion `pid != 0' on AIX kadler at us dot ibm.com
  2021-10-19 18:15 ` [Bug gdb/28471] " simon.marchi at polymtl dot ca
  2021-10-19 19:17 ` kadler at us dot ibm.com
@ 2022-08-05 21:26 ` kadler at us dot ibm.com
  2022-08-05 21:44 ` simark at simark dot ca
  3 siblings, 0 replies; 5+ messages in thread
From: kadler at us dot ibm.com @ 2022-08-05 21:26 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28471

--- Comment #3 from Kevin Adler <kadler at us dot ibm.com> ---
I saw that binutils 2.39 was released recently and going through the git
changes I saw commit c0abbd96b4dc45249daffbd2b00dfa46cf3fd5aa from Simon which
looks like it should fix this issue (though I have not tested it yet).

Unfortunately I had to put my GDB work down and was unable to provide a patch
with my changes, but it looks like it got fixed in the end and this issue can
be closed. Thanks, Simon!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug gdb/28471] Internal error on Assertion `pid != 0' on AIX
  2021-10-18 23:38 [Bug gdb/28471] New: Internal error on Assertion `pid != 0' on AIX kadler at us dot ibm.com
                   ` (2 preceding siblings ...)
  2022-08-05 21:26 ` kadler at us dot ibm.com
@ 2022-08-05 21:44 ` simark at simark dot ca
  3 siblings, 0 replies; 5+ messages in thread
From: simark at simark dot ca @ 2022-08-05 21:44 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28471

Simon Marchi <simark at simark dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |simark at simark dot ca
         Resolution|---                         |FIXED

--- Comment #4 from Simon Marchi <simark at simark dot ca> ---
Oh, thanks.  I forgot about this ticket, so I didn't mention it in the commit
message.  I'll assume it works for you, so I'll close it, but feel free to
re-open it if it's not the case.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-05 21:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-18 23:38 [Bug gdb/28471] New: Internal error on Assertion `pid != 0' on AIX kadler at us dot ibm.com
2021-10-19 18:15 ` [Bug gdb/28471] " simon.marchi at polymtl dot ca
2021-10-19 19:17 ` kadler at us dot ibm.com
2022-08-05 21:26 ` kadler at us dot ibm.com
2022-08-05 21:44 ` simark at simark dot ca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).