public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Fix AIX thread NULL assertion failure during fork
@ 2023-11-20  7:25 Aditya Kamath1
  2023-11-20 11:27 ` Ulrich Weigand
  0 siblings, 1 reply; 12+ messages in thread
From: Aditya Kamath1 @ 2023-11-20  7:25 UTC (permalink / raw)
  To: Ulrich Weigand, Aditya Kamath1 via Gdb-patches; +Cc: Sangamesh Mallayya


[-- Attachment #1.1: Type: text/plain, Size: 6132 bytes --]

Respected community members,

Hi,

Please find attached a patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}
This patch is a fix to assertion switch_to_thread: Assertion `thr != NULL' in AIX.

Reproducing this in AIX:-
Consider the program pasted below this email named as multi-thread-fork.c.

This program creates two threads and each thread forks. Only in the case where a user sets to follow the child after fork this assertion failure is seen. In remaining cases {detach_on_fork = on and follow parent, detach on fork = off and follow child and detach on fork off and follow parent} we can debug the code multi-thread-fork.c.

For example, consider output 1 pasted below this email to see the failure.

The root cause:-

Assume we have set detach_on_fork = on and set follow-fork-mode child.
In AIX, on a fork () event we set our status and return our parent ptid from rs6000-aix-nat.c..
Once the object file of the new_inferior or child process is loaded we call pd_enable () to set our thread target and sync our threadlists. In our sync_threadlists we have pbuf having our pthread library threads and gbuf having our GDB threads known to GDB core that have been registered.

The condition if (gptid.is_pid ()) in the function sync_threadlists () will hit when the child process becomes threaded and the child process from ptid_t (pid, 0, 0) is changed to ptid_t (pid, 0, uthid).

After this we return this ptid and the control is now onto GDB generic code where in the infrun.c code, a decision is made in the function follow_fork () in the case TARGET_WAITKIND_FORKED: at line number 897 that if we follow the child, we should switch to thread child of parent as shown below

if (follow_child)
  +898            {
  +899          tp = parent_targ->find_thread (child);
  +900          switch_to_thread (tp);
……..

Here the child is a ptid_t unfortunately with a ptid_t (pid, 0 ,0) instead of ptid_t (pid, 0, tid) which we changed in the sync_threadlists () of aix-thread.c.. So, when a switch to thread here happens ptid_t (pid, 0 ,0) GDB is surprised that this thread does not exist leading to assertion failure.

The FIX:-

While I cannot say with 100% surety that from where GDB core got this ptid and why it did not update to ptid_t (pid, 0, tid) , my observation post debugging is that GDB core would have got ptid_t(pid, 0, 0) from the rs6000-aix-nat.c file, inside the wait () where we did inform GDB by setting a status that this a child process belonging a parent process on a fork event. GDB could not change this ptid it got during the fork event, even though we changed it later via sync_threadlists () from aix-thread.c for the threaded event. Having said that, since in rs6000-aix-nat.c we do not handle any thread related debug behaviour, in our aix-thread.c file I have added a condition such that if there is only one thread that belongs to a process then that process is not multithreaded and need to have thread ptid changed. This fixed the issue as show below this email in Output 2.

Kindly let me know what you think. Have a nice day ahead.

Thanks and regards,
Aditya.



Multi-thread-fork.c
# cat ~/gdb_tests/multi-thread-fork.c
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>
#include <assert.h>

pthread_barrier_t barrier;

#define NUM_THREADS 2

void *
thread_function (void *arg)
{
  /* This ensures that the breakpoint is only hit after both threads
     are created, so the test can always switch to the non-event
     thread when the breakpoint triggers.  */
  pthread_barrier_wait (&barrier);

  pid_t p;
    p = fork();
    if(p<0)
    {
      perror("fork fail");
      exit(1);
    }
    // child process because return value zero
    else if ( p == 0)
        printf("Hello from Child!\n");

    // parent process because return value non-zero.
    else
        printf("Hello from Parent!\n");
  while (1); /* break here */
}

int
main (void)
{
  int i;

  alarm (300);

  pthread_barrier_init (&barrier, NULL, NUM_THREADS);

  for (i = 0; i < NUM_THREADS; i++)
    {
      pthread_t thread;
      int res;

      res = pthread_create (&thread, NULL,
                            thread_function, NULL);
      assert (res == 0);
    }

  while (1)
    sleep (1);

  return 0;
}


Output 1:-

./gdb ~/gdb_tests/multi-thread-fork
GNU gdb (GDB) 15.0.50.20231117-git
Copyright (C) 2023 Free Software Foundation, Inc.---
Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 7602578]
[New inferior 2 (process 7602578)]
[Detaching after fork from parent process 6750654]
Hello from Parent!
[Inferior 1 (process 6750654) detached]
Hello from Child!
Hello from Parent!
thread.c:1380: internal-error: switch_to_thread: Assertion `thr != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
0x1008e05db ???
0x1008e07a3 ???
0x1000437f7 ???-------
x100000aff ???
0x100000583 ???
---------------------
thread.c:1380: internal-error: switch_to_thread: Assertion `thr != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

Output 2:-

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 11403720]
[New inferior 2 (process 11403720)]
[Detaching after fork from parent process 16515532]
[Inferior 1 (process 16515532) detached]
Hello from Parent!
Hello from Child!
Hello from Parent!
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to process 11403720]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb)

[-- Attachment #2: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch --]
[-- Type: application/octet-stream, Size: 1161 bytes --]

From 58bb04ea840a5988354b855eea0f22c80d0a936b Mon Sep 17 00:00:00 2001
From: Aditya Vidyadhar Kamath <Aditya.Kamath1@ibm.com>
Date: Mon, 20 Nov 2023 01:01:35 -0600
Subject: [PATCH] Fix AIX thr!= NULL assertion failure during fork.

In AIX, while we followed the child process and detach on fork was on we hit this assertion failure.

The reason for the same was GDB core trying to switch to a child thread with tid not set that does not
exist, since child's ptid was changed to ptid_t (pid, 0, tid) in sync_threadlists() as it was threaded.

This patch is a fix for the same.
---
 gdb/aix-thread.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index 945b7f6c697..03030dc9351 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -927,7 +927,11 @@ sync_threadlists (pid_t pid)
 	      /* This is to make the main process thread now look
 		 like a thread.  */
 
-	      if (gptid.is_pid ())
+		  if (pcount == gcount == 1)
+		{
+		    gi++;pi++;
+		}
+	    else if (gptid.is_pid ())
 		{
 		  tp = proc_target->find_thread (gptid);
 		  thread_change_ptid (proc_target, gptid, pptid);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-20  7:25 [PATCH] Fix AIX thread NULL assertion failure during fork Aditya Kamath1
@ 2023-11-20 11:27 ` Ulrich Weigand
  2023-11-21  7:30   ` Aditya Kamath1
  0 siblings, 1 reply; 12+ messages in thread
From: Ulrich Weigand @ 2023-11-20 11:27 UTC (permalink / raw)
  To: gdb-patches, Aditya Kamath1; +Cc: Sangamesh Mallayya

Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>Assume we have set detach_on_fork = on and set follow-fork-mode child.
>In AIX, on a fork () event we set our status and return our parent ptid from rs6000-aix-nat.c..
>Once the object file of the new_inferior or child process is loaded we call pd_enable () to
>set our thread target and sync our threadlists. In our sync_threadlists we have pbuf having
>our pthread library threads and gbuf having our GDB threads known to GDB core that have been
>registered. 

>While I cannot say with 100% surety that from where GDB core got this ptid and why it did not
>update to ptid_t (pid, 0, tid) , my observation post debugging is that GDB core would have got
>ptid_t(pid, 0, 0) from the rs6000-aix-nat.c file, inside the wait () where we did inform GDB
>by setting a status that this a child process belonging a parent process on a fork event.
>GDB could not change this ptid it got during the fork event, even though we changed it later
>via sync_threadlists () from aix-thread.c for the threaded event.

So here's the piece I do not fully understand: Yes, wait () inside rs6000-aix-nat.c
will set the ourstatus child_ptid to a non-threaded ptid.  But that's for a newly
created child process that should not *be* threaded at this time, right?

So how is it possible that in between wait () setting the child_ptid and infrun.c
using it to switch to the child, the child is becoming multi-threaded?  Where is
the sync_threadlists () call that makes this happen?

I'm aware that the wait () in aix-thread.c (which is the caller of the rs6000-aix-nat.c
one) does perform a pd_enable / sync_threadlists, but only on the *parent*, not
on the child.  That should happen only later.

I think we should understand better how this could have happened.

If there is a good reason why the child can already be multi-threaded, then
one option to fix this would be to switch ourstatus->child_ptid to the multi-
threaded version in the *aix-thread.c* version of wait (), just like it
switches the returned ptid.

Bye,
Ulrich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-20 11:27 ` Ulrich Weigand
@ 2023-11-21  7:30   ` Aditya Kamath1
  2023-11-21 12:17     ` Ulrich Weigand
  0 siblings, 1 reply; 12+ messages in thread
From: Aditya Kamath1 @ 2023-11-21  7:30 UTC (permalink / raw)
  To: Ulrich Weigand, gdb-patches; +Cc: Sangamesh Mallayya

[-- Attachment #1: Type: text/plain, Size: 5928 bytes --]

Hi Ulrich and community,

>So here's the piece I do not fully understand: Yes, wait () inside rs6000-aix-nat.c
>will set the ourstatus child_ptid to a non-threaded ptid.  But that's for a newly
>created child process that should not *be* threaded at this time, right?

Yes. So till here it is correct and we are on the same page here. The child ptid is ptid_t (pid, 0, 0). Here child should not be threaded. And therefore we have registered to the GDB core that the parent has a non-threaded child.

>So how is it possible that in between wait () setting the child_ptid and infrun.c
>using it to switch to the child, the child is becoming multi-threaded?  Where is
>the sync_threadlists () call that makes this happen?

>I think we should understand better how this could have happened.

I’m sorry I missed an information to tell you. So the parent process is loaded it is multi-threaded, child is loaded and through wait we have informed that fork () event has happened and given the GDB core its required information.

This child now will have its object file which will be loaded soon. So new_objfile () is called which will inturn call pd_enable () and this function will call pd_activate () then pd_update (), then sync_threadlists (). Once it is in sync_threadlists () its ptid will get synced to ptid_t (pid, 0, utid) since cmp_result will be positive, pbuf has a user thread ID but gbuf does not and has a ptid which is non threaded process. This where the mess happens and we end up changing the ptid via thread_change_ptid (). After this we know that child has threaded ptid but GDB core is still using ptid_t (pid, 0, 0)..

Perhaps GDB core will update this ptid later. I am not sure of that. But yes, we need to stop pd_activate () from syncing threadlists when the call is made for a child process whose object file is just loaded and GDB core is yet to switch to this thread post detaching the parent process since the user has set his debugging options like that. If we recall we check this inf->in_initial_library_scan. But in this case, this flag is not able to stop this bug from happening.

That is why in my patch sent in the previous email I was checking that is there is only one thread that a process has then do not change the ptid to a threaded one.

So yeah this is the thought process. Let me know what you think. I am pasting the output where I have print in pd_update () and pd_enable (). We can clearly see why this is happening. Hope it helps.

Have a nice day ahead.

Thanks and regards,
Aditya.

-------------------------------------------------------
Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
pd_update pid = 9044280
pid in sync_threadlists () is 9044280
pd_update pid = 9044280
pid in sync_threadlists () is 9044280
pd_update pid = 9044280
pid in sync_threadlists () is 9044280
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 13763052]
[New inferior 2 (process 13763052)]
[Detaching after fork from parent process 9044280]
Hello from Parent!
[Inferior 1 (process 9044280) detached]
Hello from Child!
Hello from Parent!
In pd_enable with pid 13763052
pd_update pid = 13763052
pid in sync_threadlists () is 13763052
thread.c:1385: internal-error: switch_to_thread: Assertion `thr != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.



From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Date: Monday, 20 November 2023 at 4:57 PM
To: gdb-patches@sourceware.org <gdb-patches@sourceware.org>, Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>Assume we have set detach_on_fork = on and set follow-fork-mode child.
>In AIX, on a fork () event we set our status and return our parent ptid from rs6000-aix-nat.c..
>Once the object file of the new_inferior or child process is loaded we call pd_enable () to
>set our thread target and sync our threadlists. In our sync_threadlists we have pbuf having
>our pthread library threads and gbuf having our GDB threads known to GDB core that have been
>registered.

>While I cannot say with 100% surety that from where GDB core got this ptid and why it did not
>update to ptid_t (pid, 0, tid) , my observation post debugging is that GDB core would have got
>ptid_t(pid, 0, 0) from the rs6000-aix-nat.c file, inside the wait () where we did inform GDB
>by setting a status that this a child process belonging a parent process on a fork event.
>GDB could not change this ptid it got during the fork event, even though we changed it later
>via sync_threadlists () from aix-thread.c for the threaded event.

So here's the piece I do not fully understand: Yes, wait () inside rs6000-aix-nat.c
will set the ourstatus child_ptid to a non-threaded ptid.  But that's for a newly
created child process that should not *be* threaded at this time, right?

So how is it possible that in between wait () setting the child_ptid and infrun.c
using it to switch to the child, the child is becoming multi-threaded?  Where is
the sync_threadlists () call that makes this happen?

I'm aware that the wait () in aix-thread.c (which is the caller of the rs6000-aix-nat.c
one) does perform a pd_enable / sync_threadlists, but only on the *parent*, not
on the child.  That should happen only later.

I think we should understand better how this could have happened.

If there is a good reason why the child can already be multi-threaded, then
one option to fix this would be to switch ourstatus->child_ptid to the multi-
threaded version in the *aix-thread.c* version of wait (), just like it
switches the returned ptid.

Bye,
Ulrich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-21  7:30   ` Aditya Kamath1
@ 2023-11-21 12:17     ` Ulrich Weigand
  2023-11-22 10:48       ` Aditya Kamath1
  0 siblings, 1 reply; 12+ messages in thread
From: Ulrich Weigand @ 2023-11-21 12:17 UTC (permalink / raw)
  To: gdb-patches, Aditya Kamath1; +Cc: Sangamesh Mallayya

Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>>So how is it possible that in between wait () setting the child_ptid and infrun.c
>>using it to switch to the child, the child is becoming multi-threaded?  Where is
>>the sync_threadlists () call that makes this happen?
>
>>I think we should understand better how this could have happened.
> 
>I’m sorry I missed an information to tell you. So the parent process is loaded it is
>multi-threaded, child is loaded and through wait we have informed that fork () event
>has happened and given the GDB core its required information.
> 
>This child now will have its object file which will be loaded soon. So new_objfile ()
>is called which will inturn call pd_enable () and this function will call pd_activate ()
>then pd_update (), then sync_threadlists ().

Ah, I see. new_objfile is called from clone_program_space, which is
called from within follow_fork_inferior.  If this changes the child's
main thread ptid under the covers, then any future use of child_ptid
in follow_fork_inferior / follow_fork will be incorrect.

Now, the question is whether new_objfile should actually do that.
I believe that in current GDB, the answer is actually no.  While
in the past, platform-specific thread targets were indeed supposed
to do that, it seems this has been changed a while ago.  Today,
none of the other thread targets do that, as far as I can see.

Instead, there is a new ::update_thread_list target hook that
common code calls at appropriate times (when it expects ptids
to change).  I think you need to look into removing the call to
sync_threadlists from new_objfile and instead implement the
update_thread_list hook and perform that call there.

I believe you still need to perform the pd_enable / pd_activate
tasks during new_objfile, but no longer automatically call
pd_update.  The latter should be moved to update_thread_list.


Bye,
Ulrich



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-21 12:17     ` Ulrich Weigand
@ 2023-11-22 10:48       ` Aditya Kamath1
  2023-11-22 11:30         ` Ulrich Weigand
  0 siblings, 1 reply; 12+ messages in thread
From: Aditya Kamath1 @ 2023-11-22 10:48 UTC (permalink / raw)
  To: Ulrich Weigand, gdb-patches; +Cc: Sangamesh Mallayya


[-- Attachment #1.1: Type: text/plain, Size: 12181 bytes --]

Respected Ulrich and community members,

Thank you for your reply.

>Ah, I see. new_objfile is called from clone_program_space, which is
>called from within follow_fork_inferior.  If this changes the child's
>main thread ptid under the covers, then any future use of child_ptid
>in follow_fork_inferior / follow_fork will be incorrect.

>Instead, there is a new ::update_thread_list target hook that
>common code calls at appropriate times (when it expects ptids
>to change).  I think you need to look into removing the call to
>sync_threadlists from new_objfile and instead implement the
>update_thread_list hook and perform that call there.

>I believe you still need to perform the pd_enable / pd_activate
>tasks during new_objfile, but no longer automatically call
>pd_update.  The latter should be moved to update_thread_list.

Please find attached the patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}.

So this was the issue. Thank you for suggesting this. I appreciate your knowledge to tell us this. Please see Program 1 and the GDB outputs pasted below this email for set detach-on-fork on/off and set follow-fork-mode=parent/child. This works in this test case and it looks like many other test cases as well are fixed from the GDB.base and GDB.threads test suite numbers mentioned below in a section.

Kindly let me know if this is okay.

Having said that we are missing some corner cases that I want to solve in a separate patch or the same depending on whatever you suggest. I get why we are moving pd_update () to update_thread_list () as it will take care of syncing thread lists whenever required.

Let us take one classic case of fork-print-inferior-event test case.
The code looks like
#include <stdlib.h>
#include <unistd.h>

int
main (int argc, char *argv[])
{
  pid_t child;

  child = fork ();
  switch (child)
    {
      case -1:
        abort ();
      case 0:
      default:
        break;
    }

  return 0;
}

Here if we observe the difference, Linux clearly shows inferior2 the child exited.




In Linux we get :-

gdb) set follow-fork-mode child
(gdb) r
Starting program: /home/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Attaching after Thread 0x7ffff7d85740 (LWP 57) fork to child process 58]
[New inferior 2 (process 58)]
[Detaching after fork from parent process 57]
[Inferior 1 (process 57) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Inferior 2 (process 58) exited normally]

}

In AIX we get:-

Reading symbols from //gdb_tests/fork-print-inferior-events...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/fork-print-inferior-events
[Attaching after process 22151436 fork to child process 23527898]
[New inferior 2 (process 23527898)]
[Detaching after fork from parent process 22151436]
[Inferior 1 (process 22151436) detached]

Thread 2.1 received signal SIGHUP, Hangup.
[Switching to process 23527898]
0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o)
(gdb)

-------------------------------------------------------------------------------------------------------------------------------------

But in AIX we are not able to catch this event currently. That is why this condition was added.
+  if (inferior_ptid.pid ())

If we do not add this condition, then get_thread_data_helper_for_ptid () will crash as pid is 0 with a pid!= NULL assertion since child exits.

Also, if we check the output of Program 1 below and the case where we follow-parent and detach-on-fork is off, the child processes are shown as process and not threads. But in Linux it is shown as threads when we use info threads command.

Only these two corner cases we have problems in AIX.

Kindly let me know where I went wrong in catching the exit event and the new child being converted to thread ptid event so that we can solve this for AIX and GDB. Somewhere my debug analysis went wrong while solving this issue. It would be of helpful to have better user experience though with the current patch , debugging works.

Have a nice day ahead.

Thanks and regards,
Aditya.

The test case numbers :-

before this patch GDB.base
=== gdb Summary ===
# of expected passes        28377
# of unexpected failures    2163
# of expected failures      15
# of known failures     10
# of unresolved testcases   154
# of untested testcases     79
# of unsupported tests      117
# of paths in test names    5
# of duplicate test names   4

After this patch GDB.base
=== gdb Summary ===
# of expected passes            28588
# of unexpected failures        2191
# of unexpected successes       1
# of expected failures          16
# of known failures             9
# of unresolved testcases       138
# of untested testcases         79
# of unsupported tests          116
# of paths in test names        5
# of duplicate test names       2

Before this patch GDB.threads
# of expected passes        1790
# of unexpected failures    610
# of expected failures      4
# of known failures     21
# of unresolved testcases   10
# of untested testcases     9
# of unsupported tests      82
# of paths in test names    3

After this patch GDB.thread
=== gdb Summary ===
# of expected passes            1871
# of unexpected failures        630
# of expected failures          4
# of known failures             21
# of unresolved testcases       2
# of untested testcases         9
# of unsupported tests          81
# of paths in test names        3

-------------------------------------------------------------------------------------------------------------------------

Program 1:-
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>
#include <assert.h>

pthread_barrier_t barrier;

#define NUM_THREADS 2

void *
thread_function (void *arg)
{
  /* This ensures that the breakpoint is only hit after both threads
     are created, so the test can always switch to the non-event
     thread when the breakpoint triggers.  */
  pthread_barrier_wait (&barrier);

  pid_t p;
    p = fork();
    if(p<0)
    {
      perror("fork fail");
      exit(1);
    }
    // child process because return value zero
    else if ( p == 0)
        printf("Hello from Child!\n");
    // parent process because return value non-zero.
    else
        printf("Hello from Parent!\n");
  while (1); /* break here */
}

int
main (void)
{
  int i;

  alarm (300);

  pthread_barrier_init (&barrier, NULL, NUM_THREADS);

  for (i = 0; i < NUM_THREADS; i++)
    {
      pthread_t thread;
      int res;

      res = pthread_create (&thread, NULL,
                            thread_function, NULL);
      assert (res == 0);
    }

  while (1)
    sleep (1);

  return 0;
}


Follow-fork = parent detach-on-fork = on

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Detaching after fork from child process 15663474]
Hello from Child!
[Detaching after fork from child process 15532296]
Hello from Child!
Hello from Parent!
Hello from Parent!

Thread 1 received signal SIGINT, Interrupt.
0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb)

Follow-fork = child detach-on-fork = parent

eading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 15532298]
[New inferior 2 (process 15532298)]
[Detaching after fork from parent process 22151424]
[Inferior 1 (process 22151424) detached]
Hello from Parent!
Hello from Parent!
Hello from Child!
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to Thread 515]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb)

Follow-fork = parent detach-on-fork = parent

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[New inferior 2 (process 23527894)]
[New inferior 3 (process 15663478)]
Hello from Parent!
Hello from Parent!

Thread 1.1 received signal SIGINT, Interrupt.
0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb) info threads
  Id   Target Id                          Frame
* 1.1  Thread 1 (tid 30744339, running)   0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
  1.2  Thread 258 (tid 39264247, running) thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
  1.3  Thread 515 (tid 8984493, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
  2.1  process 23527894                   0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
  3.1  process 15663478                   0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb)

Follow-fork = child detach-on-fork = parent

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) set detach-on-fork off
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 15663480]
[New inferior 2 (process 15663480)]
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to process 15663480]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb)

From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Date: Tuesday, 21 November 2023 at 5:47 PM
To: gdb-patches@sourceware.org <gdb-patches@sourceware.org>, Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>>So how is it possible that in between wait () setting the child_ptid and infrun.c
>>using it to switch to the child, the child is becoming multi-threaded?  Where is
>>the sync_threadlists () call that makes this happen?
>
>>I think we should understand better how this could have happened.
>
>I’m sorry I missed an information to tell you. So the parent process is loaded it is
>multi-threaded, child is loaded and through wait we have informed that fork () event
>has happened and given the GDB core its required information.
>
>This child now will have its object file which will be loaded soon. So new_objfile ()
>is called which will inturn call pd_enable () and this function will call pd_activate ()
>then pd_update (), then sync_threadlists ().

Ah, I see. new_objfile is called from clone_program_space, which is
called from within follow_fork_inferior.  If this changes the child's
main thread ptid under the covers, then any future use of child_ptid
in follow_fork_inferior / follow_fork will be incorrect.

Now, the question is whether new_objfile should actually do that.
I believe that in current GDB, the answer is actually no.  While
in the past, platform-specific thread targets were indeed supposed
to do that, it seems this has been changed a while ago.  Today,
none of the other thread targets do that, as far as I can see.

Instead, there is a new ::update_thread_list target hook that
common code calls at appropriate times (when it expects ptids
to change).  I think you need to look into removing the call to
sync_threadlists from new_objfile and instead implement the
update_thread_list hook and perform that call there.

I believe you still need to perform the pd_enable / pd_activate
tasks during new_objfile, but no longer automatically call
pd_update.  The latter should be moved to update_thread_list.


Bye,
Ulrich


[-- Attachment #2: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch --]
[-- Type: application/octet-stream, Size: 2099 bytes --]

From 03a6b2336bfd721fa217b089be51f0701c7334ad Mon Sep 17 00:00:00 2001
From: Aditya Vidyadhar Kamath <Aditya.Kamath1@ibm.com>
Date: Wed, 22 Nov 2023 04:03:17 -0600
Subject: [PATCH] Fix AIX thr!= NULL assertion failure during fork.

In AIX, while we followed the child process and detach on fork was on we hit thr!= NULL assertion failure.

The reason for the same was GDB core trying to switch to a child thread with tid not set that does not
exist, since child's ptid was changed to ptid_t (pid, 0, tid) in sync_threadlists() as it was threaded.

The way this happened was when a new child process is born, its object file will be loaded, calling the new_objfile ()
in aix-thread.c file from clone_program_space, which is
called from within follow_fork_inferior. Therefore it end ups syncing threadlists via pd_update ().

This patch is a fix for the same where pd_update () is called in the wait () or in update_thread_list() hook only.
---
 gdb/aix-thread.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index 945b7f6c697..717d0f2129c 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -136,6 +136,8 @@ class aix_thread_target final : public target_ops
   const char *extra_thread_info (struct thread_info *) override;
 
   ptid_t get_ada_task_ptid (long lwp, ULONGEST thread) override;
+
+  void update_thread_list () override;
 };
 
 static aix_thread_target aix_thread_ops;
@@ -1011,6 +1013,13 @@ pd_update (pid_t pid)
   return ptid;
 }
 
+void
+aix_thread_target::update_thread_list ()
+{
+  if (inferior_ptid.pid ())
+    pd_update (inferior_ptid.pid ());
+}
+
 /* Try to start debugging threads in the current process.
    If successful and there exists and we can find an event thread, return a ptid
    for that thread.  Otherwise, return a ptid-only ptid using PID.  */
@@ -1030,7 +1039,7 @@ pd_activate (pid_t pid)
       return ptid_t (pid);
     }
   data->pd_active = 1;
-  return pd_update (pid);
+  return ptid_t (pid);
 }
 
 /* An object file has just been loaded.  Check whether the current
-- 
2.41.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-22 10:48       ` Aditya Kamath1
@ 2023-11-22 11:30         ` Ulrich Weigand
  2023-11-22 13:58           ` Aditya Kamath1
  0 siblings, 1 reply; 12+ messages in thread
From: Ulrich Weigand @ 2023-11-22 11:30 UTC (permalink / raw)
  To: gdb-patches, Aditya Kamath1; +Cc: Sangamesh Mallayya

Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>So this was the issue. Thank you for suggesting this. I appreciate your knowledge
>to tell us this. Please see Program 1 and the GDB outputs pasted below this email
>for set detach-on-fork on/off and set follow-fork-mode=parent/child. This works
>in this test case and it looks like many other test cases as well are fixed from
>the GDB.base and GDB.threads test suite numbers mentioned below in a section.

Excellent, that's good to hear!

>Kindly let me know if this is okay.

I see one possible issue with the current patch - there is one
other existing caller to pd_activate in the ::wait routine:

      if (regcache_read_pc (regcache)
          - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
        return pd_activate (ptid.pid ());
    }

  return pd_update (ptid.pid ());

This is the one place other than update_thread_list where you do
in fact need to call pd_update.  However, now that you've changed
pd_activate to no longer call pd_update, in the case where the
code above executes "return pd_activate", that call is now missing.

I think a straightforward fix might be to simply remove the "return"
and just fall through into the pd_update after pd_activate returned.

(Note that the return value of pd_activate should then no longer
be used anywhere and can be removed.)


Also, it looks like your're only updating the *current* inferior
in update_thread_list.  Other targets update the thread lists of
*all* inferiors.  E.g. have a look at the linux-thread-db.c
version of update_thread_list.


>Having said that we are missing some corner cases that I want to solve
>in a separate patch or the same depending on whatever you suggest.

Do the above change fix those corner cases?

Bye,
Ulrich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-22 11:30         ` Ulrich Weigand
@ 2023-11-22 13:58           ` Aditya Kamath1
  2023-11-22 14:14             ` Aditya Kamath1
  0 siblings, 1 reply; 12+ messages in thread
From: Aditya Kamath1 @ 2023-11-22 13:58 UTC (permalink / raw)
  To: Ulrich Weigand, gdb-patches; +Cc: Sangamesh Mallayya


[-- Attachment #1.1: Type: text/plain, Size: 11595 bytes --]

Respected Ulrich and community members,

Hi,

Please find attached the patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}.

You are genius. Thank you for your inputs. I have a lot to learn from you.

>I see one possible issue with the current patch - there is one
>other existing caller to pd_activate in the ::wait routine:

 >     if (regcache_read_pc (regcache)
 >         - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
 >       return pd_activate (ptid.pid ());

>This is the one place other than update_thread_list where you do
>in fact need to call pd_update.  However, now that you've changed
>pd_activate to no longer call pd_update, in the case where the
>code above executes "return pd_activate", that call is now missing.

>I think a straightforward fix might be to simply remove the "return"
>and just fall through into the pd_update after pd_activate returned.

This is done.

>Also, it looks like your're only updating the *current* inferior
>in update_thread_list.  Other targets update the thread lists of
>*all* inferiors.  E.g. have a look at the linux-thread-db.c
>version of update_thread_list.

Yeah, so from here on we are doing this. Please see in the patch.

>Having said that we are missing some corner cases that I want to solve
>in a separate patch or the same depending on whatever you suggest.

Do the above change fix those corner cases?

Yes. So we have a little more to understand here. So once pd_enable () was called and then pd_activate () was called it looks like either the libraries were not fully loaded though the pthdb_session_pthreaded was successful or the create thread event breakpoint was not successful. So the code execution did not reach pd_activate (). Therefore our child process was still ptid_t (pid, 0 ,0). When I used the pd_activate () inside the update_threadlist () it succeeded to initialise a session later on when it was called and then we were able to set pd_active and get to sync_threadlists (). Please see the outputs below for Program 1. We are awesome here.

Kindly let me know if this patch is good. Kindly push if there are no changes needed.

Let me also tell you why there are 2 failures out 32 as of now in foll-fork-print-inferiors.exp test case in which we are not able to see the message if child exits when we follow child.

The code is like this.
#include <stdlib.h>
#include <unistd.h>

int
main (int argc, char *argv[])
{
  pid_t child;

  child = fork ();
  switch (child)
    {
      case -1:
        abort ();
      case 0:
      default:
        break;
    }

  return 0;
}

So when pd_enable () is called and we succeed in pthdb_session_pthreaded (), the libraries are still not loaded or breakpoint is not se and we are unable to go to pd_activate() leaving our pd_active to be 0. This means pd_update () cannot go to sync_threadlists (). That is why the child is still ptid_t (pid, 0, 0), and by the time we could do something we receive a HUP signal and the code exits. That is the issue. Please see the output below. We see this only when we follow parent. When we follow parent all is well irrespective of detach-on-fork.

Reading symbols from //gdb_tests/fork-print-inferior-events...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/fork-print-inferior-events
[Attaching after process 7340304 fork to child process 24117514]
[New inferior 2 (process 24117514)]
[Detaching after fork from parent process 7340304]
[Inferior 1 (process 7340304) detached]

Thread 2.1 received signal SIGHUP, Hangup.
[Switching to process 24117514]
0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o)
(gdb) info threads
  Id   Target Id         Frame
* 2.1  process 24117514  0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o)
(gdb) q

Have a nice day ahead.

Thanks and regards,
Aditya.


Program 1:-
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>
#include <assert.h>

pthread_barrier_t barrier;

#define NUM_THREADS 2

void *
thread_function (void *arg)
{
  /* This ensures that the breakpoint is only hit after both threads
     are created, so the test can always switch to the non-event
     thread when the breakpoint triggers.  */
  pthread_barrier_wait (&barrier);

  pid_t p;
    p = fork();
    if(p<0)
    {
      perror("fork fail");
      exit(1);
    }
    // child process because return value zero
    else if ( p == 0)
        printf("Hello from Child!\n");
    // parent process because return value non-zero.
    else
        printf("Hello from Parent!\n");
  while (1); /* break here */
}

int
main (void)
{
  int i;

  alarm (300);

  pthread_barrier_init (&barrier, NULL, NUM_THREADS);

  for (i = 0; i < NUM_THREADS; i++)
    {
      pthread_t thread;
      int res;

      res = pthread_create (&thread, NULL,
                            thread_function, NULL);
      assert (res == 0);
    }

  while (1)
    sleep (1);

  return 0;
}
Follow-fork-mode = parent, detach-on-fork = on
Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Detaching after fork from child process 11338010]
Hello from Child!
Hello from Parent!
[Detaching after fork from child process 16515516]
Hello from Child!
Hello from Parent!

Thread 2 received signal SIGINT, Interrupt.
[Switching to Thread 258]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
  1    Thread 1 (tid 8853293, running)    warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
* 2    Thread 258 (tid 10098629, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
0x0) at //gdb_tests/multi-thread-fork.c:50
  3    Thread 515 (tid 9443313, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)

Follow-fork-mode = child, detach-on-fork = on

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 16515518]
[New inferior 2 (process 16515518)]
[Detaching after fork from parent process 7340290]
[Inferior 1 (process 7340290) detached]
Hello from Parent!
Hello from Child!
Hello from Parent!
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to Thread 515]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
* 2.1  Thread 515 (tid 76879955, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)
Follow-fork-mode = parent, detach-on-fork = off

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[New inferior 2 (process 24117510)]
Hello from Parent!
[New inferior 3 (process 11338014)]
Hello from Parent!

Thread 1.1 received signal SIGINT, Interrupt.
0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb) info threads
  Id   Target Id                          Frame
* 1.1  Thread 1 (tid 7870387, running)    0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
  1.2  Thread 258 (tid 10098633, running) thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
  1.3  Thread 515 (tid 9443317, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
  2.1  Thread 515 (tid 8132439, running)  0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
  3.1  Thread 258 (tid 77533225, running) 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb)

Follow-fork-mode = child, detach-on-fork = off

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 11338016]
[New inferior 2 (process 11338016)]
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to Thread 515]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
  1.1  Thread 1 (tid 113118883, running)  warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
  1.2  Thread 258 (tid 9115441, sleeping) warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
  1.3  Thread 515 (tid 7870389, running)  0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
* 2.1  Thread 515 (tid 9443319, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)



From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Date: Wednesday, 22 November 2023 at 5:00 PM
To: gdb-patches@sourceware.org <gdb-patches@sourceware.org>, Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>So this was the issue. Thank you for suggesting this. I appreciate your knowledge
>to tell us this. Please see Program 1 and the GDB outputs pasted below this email
>for set detach-on-fork on/off and set follow-fork-mode=parent/child. This works
>in this test case and it looks like many other test cases as well are fixed from
>the GDB.base and GDB.threads test suite numbers mentioned below in a section.

Excellent, that's good to hear!

>Kindly let me know if this is okay.

I see one possible issue with the current patch - there is one
other existing caller to pd_activate in the ::wait routine:

      if (regcache_read_pc (regcache)
          - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
        return pd_activate (ptid.pid ());
    }

  return pd_update (ptid.pid ());

This is the one place other than update_thread_list where you do
in fact need to call pd_update.  However, now that you've changed
pd_activate to no longer call pd_update, in the case where the
code above executes "return pd_activate", that call is now missing.

I think a straightforward fix might be to simply remove the "return"
and just fall through into the pd_update after pd_activate returned.

(Note that the return value of pd_activate should then no longer
be used anywhere and can be removed.)


Also, it looks like your're only updating the *current* inferior
in update_thread_list.  Other targets update the thread lists of
*all* inferiors.  E.g. have a look at the linux-thread-db.c
version of update_thread_list.


>Having said that we are missing some corner cases that I want to solve
>in a separate patch or the same depending on whatever you suggest.

Do the above change fix those corner cases?

Bye,
Ulrich

[-- Attachment #2: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch --]
[-- Type: application/octet-stream, Size: 2734 bytes --]

From e87e4085af2f7d4a6f899e8ce05baae6a718395f Mon Sep 17 00:00:00 2001
From: Aditya Vidyadhar Kamath <Aditya.Kamath1@ibm.com>
Date: Wed, 22 Nov 2023 07:33:51 -0600
Subject: [PATCH] Fix AIX thr!= NULL assertion failure during fork.

In AIX, while we followed the child process and detach on fork was on we hit thr!= NULL assertion failure.

The reason for the same was GDB core trying to switch to a child thread with tid not set that does not
exist, since child's ptid was changed to ptid_t (pid, 0, tid) in sync_threadlists() as it was threaded.

The way this happened was when a new child process is born, its object file will be loaded, calling the new_objfile ()
in aix-thread.c file from clone_program_space, which is
called from within follow_fork_inferior. Therefore it end ups syncing threadlists via pd_update ().

This patch is a fix for the same where pd_update () is called in the wait () or in update_thread_list() hook only.
---
 gdb/aix-thread.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index 945b7f6c697..b777db5e06c 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -136,6 +136,8 @@ class aix_thread_target final : public target_ops
   const char *extra_thread_info (struct thread_info *) override;
 
   ptid_t get_ada_task_ptid (long lwp, ULONGEST thread) override;
+
+  void update_thread_list () override;
 };
 
 static aix_thread_target aix_thread_ops;
@@ -1015,7 +1017,7 @@ pd_update (pid_t pid)
    If successful and there exists and we can find an event thread, return a ptid
    for that thread.  Otherwise, return a ptid-only ptid using PID.  */
 
-static ptid_t
+static void 
 pd_activate (pid_t pid)
 {
   int status;
@@ -1030,9 +1032,29 @@ pd_activate (pid_t pid)
       return ptid_t (pid);
     }
   data->pd_active = 1;
-  return pd_update (pid);
+  return;
+}
+
+/* AIX implementation of update_thread_list.  */
+
+void
+aix_thread_target::update_thread_list ()
+{
+  struct thread_db_info *info;
+  struct aix_thread_variables *data;
+  for (inferior *inf : all_inferiors ())
+	{
+	  if (inf->pid == 0)
+		continue;
+
+	  data = get_aix_thread_variables_data (inf);
+
+	  pd_activate (inf->pid);
+	  pd_update (inf->pid);
+	}
 }
 
+
 /* An object file has just been loaded.  Check whether the current
    application is pthreaded, and if so, prepare for thread debugging.  */
 
@@ -1216,7 +1238,7 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,
 
       if (regcache_read_pc (regcache)
 	  - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
-	return pd_activate (ptid.pid ());
+		pd_activate (ptid.pid ());
     }
 
   return pd_update (ptid.pid ());
-- 
2.41.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-22 13:58           ` Aditya Kamath1
@ 2023-11-22 14:14             ` Aditya Kamath1
  2023-11-22 15:33               ` Ulrich Weigand
  0 siblings, 1 reply; 12+ messages in thread
From: Aditya Kamath1 @ 2023-11-22 14:14 UTC (permalink / raw)
  To: Ulrich Weigand, gdb-patches; +Cc: Sangamesh Mallayya


[-- Attachment #1.1: Type: text/plain, Size: 12971 bytes --]

Respected Ulrich and community members,

>>Yes. So we have a little more to understand here. So once pd_enable () was called and then pd_activate () >>was called it looks like either the libraries were not fully loaded though the pthdb_session_pthreaded was >>successful or the create thread event breakpoint was not successful. So the code execution did not reach >>pd_activate (). Therefore our child process was still ptid_t (pid, 0 ,0). When I used the pd_activate () inside >?>>the update_threadlist () it succeeded to initialise a session later on when it was called and then we were >>able to set pd_active and get to sync_threadlists (). Please see the outputs below for Program 1. We are >>awesome here.

I can confirm that this condition
/* When attaching / handling fork child, don't try activating
     thread debugging until we know about all shared libraries.  */
  if (inf->in_initial_library_scan)
    return;

is what is the reason we fail to reach pd_activate ().. Sorry for not being clear in my previous email..

From: Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Date: Wednesday, 22 November 2023 at 7:28 PM
To: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>, gdb-patches@sourceware.org <gdb-patches@sourceware.org>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Respected Ulrich and community members,

Hi,

Please find attached the patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}.

You are genius. Thank you for your inputs. I have a lot to learn from you.

>I see one possible issue with the current patch - there is one
>other existing caller to pd_activate in the ::wait routine:

 >     if (regcache_read_pc (regcache)
 >         - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
 >       return pd_activate (ptid.pid ());

>This is the one place other than update_thread_list where you do
>in fact need to call pd_update.  However, now that you've changed
>pd_activate to no longer call pd_update, in the case where the
>code above executes "return pd_activate", that call is now missing.

>I think a straightforward fix might be to simply remove the "return"
>and just fall through into the pd_update after pd_activate returned.

This is done.


>Also, it looks like your're only updating the *current* inferior
>in update_thread_list.  Other targets update the thread lists of
>*all* inferiors.  E.g. have a look at the linux-thread-db.c
>version of update_thread_list.

Yeah, so from here on we are doing this. Please see in the patch.

>Having said that we are missing some corner cases that I want to solve
>in a separate patch or the same depending on whatever you suggest.

Do the above change fix those corner cases?

Yes. So we have a little more to understand here. So once pd_enable () was called and then pd_activate () was called it looks like either the libraries were not fully loaded though the pthdb_session_pthreaded was successful or the create thread event breakpoint was not successful. So the code execution did not reach pd_activate (). Therefore our child process was still ptid_t (pid, 0 ,0). When I used the pd_activate () inside the update_threadlist () it succeeded to initialise a session later on when it was called and then we were able to set pd_active and get to sync_threadlists (). Please see the outputs below for Program 1. We are awesome here.

Kindly let me know if this patch is good. Kindly push if there are no changes needed.

Let me also tell you why there are 2 failures out 32 as of now in foll-fork-print-inferiors.exp test case in which we are not able to see the message if child exits when we follow child.

The code is like this.
#include <stdlib.h>
#include <unistd.h>

int
main (int argc, char *argv[])
{
  pid_t child;

  child = fork ();
  switch (child)
    {
      case -1:
        abort ();
      case 0:
      default:
        break;
    }

  return 0;
}

So when pd_enable () is called and we succeed in pthdb_session_pthreaded (), the libraries are still not loaded or breakpoint is not se and we are unable to go to pd_activate() leaving our pd_active to be 0. This means pd_update () cannot go to sync_threadlists (). That is why the child is still ptid_t (pid, 0, 0), and by the time we could do something we receive a HUP signal and the code exits. That is the issue. Please see the output below. We see this only when we follow parent. When we follow parent all is well irrespective of detach-on-fork.

Reading symbols from //gdb_tests/fork-print-inferior-events...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/fork-print-inferior-events
[Attaching after process 7340304 fork to child process 24117514]
[New inferior 2 (process 24117514)]
[Detaching after fork from parent process 7340304]
[Inferior 1 (process 7340304) detached]

Thread 2.1 received signal SIGHUP, Hangup.
[Switching to process 24117514]
0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o)
(gdb) info threads
  Id   Target Id         Frame
* 2.1  process 24117514  0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o)
(gdb) q

Have a nice day ahead.

Thanks and regards,
Aditya.



Program 1:-
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>
#include <assert.h>

pthread_barrier_t barrier;

#define NUM_THREADS 2

void *
thread_function (void *arg)
{
  /* This ensures that the breakpoint is only hit after both threads
     are created, so the test can always switch to the non-event
     thread when the breakpoint triggers.  */
  pthread_barrier_wait (&barrier);

  pid_t p;
    p = fork();
    if(p<0)
    {
      perror("fork fail");
      exit(1);
    }
    // child process because return value zero
    else if ( p == 0)
        printf("Hello from Child!\n");
    // parent process because return value non-zero.
    else
        printf("Hello from Parent!\n");
  while (1); /* break here */
}

int
main (void)
{
  int i;

  alarm (300);

  pthread_barrier_init (&barrier, NULL, NUM_THREADS);

  for (i = 0; i < NUM_THREADS; i++)
    {
      pthread_t thread;
      int res;

      res = pthread_create (&thread, NULL,
                            thread_function, NULL);
      assert (res == 0);
    }

  while (1)
    sleep (1);

  return 0;
}
Follow-fork-mode = parent, detach-on-fork = on
Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Detaching after fork from child process 11338010]
Hello from Child!
Hello from Parent!
[Detaching after fork from child process 16515516]
Hello from Child!
Hello from Parent!

Thread 2 received signal SIGINT, Interrupt.
[Switching to Thread 258]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
  1    Thread 1 (tid 8853293, running)    warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
* 2    Thread 258 (tid 10098629, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
0x0) at //gdb_tests/multi-thread-fork.c:50
  3    Thread 515 (tid 9443313, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)

Follow-fork-mode = child, detach-on-fork = on

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 16515518]
[New inferior 2 (process 16515518)]
[Detaching after fork from parent process 7340290]
[Inferior 1 (process 7340290) detached]
Hello from Parent!
Hello from Child!
Hello from Parent!
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to Thread 515]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
* 2.1  Thread 515 (tid 76879955, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)
Follow-fork-mode = parent, detach-on-fork = off

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[New inferior 2 (process 24117510)]
Hello from Parent!
[New inferior 3 (process 11338014)]
Hello from Parent!

Thread 1.1 received signal SIGINT, Interrupt.
0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb) info threads
  Id   Target Id                          Frame
* 1.1  Thread 1 (tid 7870387, running)    0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
  1.2  Thread 258 (tid 10098633, running) thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
  1.3  Thread 515 (tid 9443317, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
  2.1  Thread 515 (tid 8132439, running)  0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
  3.1  Thread 258 (tid 77533225, running) 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb)


Follow-fork-mode = child, detach-on-fork = off

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 11338016]
[New inferior 2 (process 11338016)]
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to Thread 515]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
  1.1  Thread 1 (tid 113118883, running)  warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
  1.2  Thread 258 (tid 9115441, sleeping) warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
  1.3  Thread 515 (tid 7870389, running)  0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
* 2.1  Thread 515 (tid 9443319, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)



From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Date: Wednesday, 22 November 2023 at 5:00 PM
To: gdb-patches@sourceware.org <gdb-patches@sourceware.org>, Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>So this was the issue. Thank you for suggesting this. I appreciate your knowledge
>to tell us this. Please see Program 1 and the GDB outputs pasted below this email
>for set detach-on-fork on/off and set follow-fork-mode=parent/child. This works
>in this test case and it looks like many other test cases as well are fixed from
>the GDB.base and GDB.threads test suite numbers mentioned below in a section.

Excellent, that's good to hear!

>Kindly let me know if this is okay.

I see one possible issue with the current patch - there is one
other existing caller to pd_activate in the ::wait routine:

      if (regcache_read_pc (regcache)
          - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
        return pd_activate (ptid.pid ());
    }

  return pd_update (ptid.pid ());

This is the one place other than update_thread_list where you do
in fact need to call pd_update.  However, now that you've changed
pd_activate to no longer call pd_update, in the case where the
code above executes "return pd_activate", that call is now missing.

I think a straightforward fix might be to simply remove the "return"
and just fall through into the pd_update after pd_activate returned.

(Note that the return value of pd_activate should then no longer
be used anywhere and can be removed.)


Also, it looks like your're only updating the *current* inferior
in update_thread_list.  Other targets update the thread lists of
*all* inferiors.  E.g. have a look at the linux-thread-db.c
version of update_thread_list.


>Having said that we are missing some corner cases that I want to solve
>in a separate patch or the same depending on whatever you suggest.

Do the above change fix those corner cases?

Bye,
Ulrich

[-- Attachment #2: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch --]
[-- Type: application/octet-stream, Size: 2734 bytes --]

From e87e4085af2f7d4a6f899e8ce05baae6a718395f Mon Sep 17 00:00:00 2001
From: Aditya Vidyadhar Kamath <Aditya.Kamath1@ibm.com>
Date: Wed, 22 Nov 2023 07:33:51 -0600
Subject: [PATCH] Fix AIX thr!= NULL assertion failure during fork.

In AIX, while we followed the child process and detach on fork was on we hit thr!= NULL assertion failure.

The reason for the same was GDB core trying to switch to a child thread with tid not set that does not
exist, since child's ptid was changed to ptid_t (pid, 0, tid) in sync_threadlists() as it was threaded.

The way this happened was when a new child process is born, its object file will be loaded, calling the new_objfile ()
in aix-thread.c file from clone_program_space, which is
called from within follow_fork_inferior. Therefore it end ups syncing threadlists via pd_update ().

This patch is a fix for the same where pd_update () is called in the wait () or in update_thread_list() hook only.
---
 gdb/aix-thread.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index 945b7f6c697..b777db5e06c 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -136,6 +136,8 @@ class aix_thread_target final : public target_ops
   const char *extra_thread_info (struct thread_info *) override;
 
   ptid_t get_ada_task_ptid (long lwp, ULONGEST thread) override;
+
+  void update_thread_list () override;
 };
 
 static aix_thread_target aix_thread_ops;
@@ -1015,7 +1017,7 @@ pd_update (pid_t pid)
    If successful and there exists and we can find an event thread, return a ptid
    for that thread.  Otherwise, return a ptid-only ptid using PID.  */
 
-static ptid_t
+static void 
 pd_activate (pid_t pid)
 {
   int status;
@@ -1030,9 +1032,29 @@ pd_activate (pid_t pid)
       return ptid_t (pid);
     }
   data->pd_active = 1;
-  return pd_update (pid);
+  return;
+}
+
+/* AIX implementation of update_thread_list.  */
+
+void
+aix_thread_target::update_thread_list ()
+{
+  struct thread_db_info *info;
+  struct aix_thread_variables *data;
+  for (inferior *inf : all_inferiors ())
+	{
+	  if (inf->pid == 0)
+		continue;
+
+	  data = get_aix_thread_variables_data (inf);
+
+	  pd_activate (inf->pid);
+	  pd_update (inf->pid);
+	}
 }
 
+
 /* An object file has just been loaded.  Check whether the current
    application is pthreaded, and if so, prepare for thread debugging.  */
 
@@ -1216,7 +1238,7 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,
 
       if (regcache_read_pc (regcache)
 	  - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
-	return pd_activate (ptid.pid ());
+		pd_activate (ptid.pid ());
     }
 
   return pd_update (ptid.pid ());
-- 
2.41.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-22 14:14             ` Aditya Kamath1
@ 2023-11-22 15:33               ` Ulrich Weigand
  2023-11-22 16:22                 ` Aditya Kamath1
  0 siblings, 1 reply; 12+ messages in thread
From: Ulrich Weigand @ 2023-11-22 15:33 UTC (permalink / raw)
  To: gdb-patches, Aditya Kamath1; +Cc: Sangamesh Mallayya

Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>>Yes. So we have a little more to understand here. So once pd_enable () 
>>was called and then pd_activate () was called it looks like either 
>>the libraries were not fully loaded though the pthdb_session_pthreaded 
>>successful or the create thread event breakpoint was not successful.
>>So the code execution did not reach >>pd_activate (). Therefore our
>>child process was still ptid_t (pid, 0 ,0). When I used the pd_activate ()
>>inside the update_threadlist () it succeeded to initialise a session
>>later on when it was called and then we were able to set pd_active
>>and get to sync_threadlists ().
>
>I can confirm that this condition
>/* When attaching / handling fork child, don't try activating
>     thread debugging until we know about all shared libraries.  */
>  if (inf->in_initial_library_scan)
>    return;
>
>is what is the reason we fail to reach pd_activate ().. Sorry for not being clear in my previous email..

OK, so this raises two questions:

- I think we never should call pd_activate twice on a process
  where pd_active is already true.  This can now happen when
  you call pd_activate from update_thread_list.

- Do we even need the in_initial_library_scan check at all
  anymore? I seem to recall we added that as the
  sync_threadlists call could cause confusion during early
  startup.  But now that we don't call sync_threadlists
  from pd_activate any more, maybe we can simply remove that
  check completely?  And then, maybe we no longer need to
  call pd_active from update_thread_list.

Minor issues I noticed in the patch:

- Variable "data" in update_thread_list looks unused?
- Some whitespace issues - please watch the TAB settings,
  in the GDB sources a TAB should be 8 spaces.

Bye,
Ulrich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-22 15:33               ` Ulrich Weigand
@ 2023-11-22 16:22                 ` Aditya Kamath1
  2023-11-22 18:30                   ` Ulrich Weigand
  0 siblings, 1 reply; 12+ messages in thread
From: Aditya Kamath1 @ 2023-11-22 16:22 UTC (permalink / raw)
  To: Ulrich Weigand, gdb-patches; +Cc: Sangamesh Mallayya


[-- Attachment #1.1: Type: text/plain, Size: 4484 bytes --]

Respected Ulrich and community members,

Please find attached the patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}

>OK, so this raises two questions:

>- I think we never should call pd_activate twice on a process
>  where pd_active is already true.  This can now happen when
>  you call pd_activate from update_thread_list.

Yes.

>- Do we even need the in_initial_library_scan check at all
>  anymore?
No we do not it. It works without it, since now we use the update_thread_list () hook. Please see the output pasted below. This not required anymore. We used it as we wanted to control the pd_activate () call till libraries were loaded.

>Minor issues I noticed in the patch:

>- Variable "data" in update_thread_list looks unused?
>- Some whitespace issues - please watch the TAB settings,
>  in the GDB sources a TAB should be 8 spaces.

I set this. Now you should pd_update () aligned properly. I will take care of this going forward. Thank you.

If this patch is okay kindly push it. If not let me know.

Have a nice day ahead.

Thanks and regards,
Aditya.

-----------------------------------------------
Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[New inferior 2 (process 23658790)]
Hello from Parent!
[New inferior 3 (process 8913402)]
Hello from Parent!
info threads

Thread 1.1 received signal SIGINT, Interrupt.
0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb) info threads
  Id   Target Id                           Frame
* 1.1  Thread 1 (tid 9574159, running)     0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
  1.2  Thread 258 (tid 8853455, running)   thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
  1.3  Thread 515 (tid 9770981, running)   thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
  2.1  Thread 515 (tid 10360707, running)  0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
  3.1  Thread 258 (tid 113118727, running) 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb) q
A debugging session is active.
From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Date: Wednesday, 22 November 2023 at 9:03 PM
To: gdb-patches@sourceware.org <gdb-patches@sourceware.org>, Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>>Yes. So we have a little more to understand here. So once pd_enable ()
>>was called and then pd_activate () was called it looks like either
>>the libraries were not fully loaded though the pthdb_session_pthreaded
>>successful or the create thread event breakpoint was not successful.
>>So the code execution did not reach >>pd_activate (). Therefore our
>>child process was still ptid_t (pid, 0 ,0). When I used the pd_activate ()
>>inside the update_threadlist () it succeeded to initialise a session
>>later on when it was called and then we were able to set pd_active
>>and get to sync_threadlists ().
>
>I can confirm that this condition
>/* When attaching / handling fork child, don't try activating
>     thread debugging until we know about all shared libraries.  */
>  if (inf->in_initial_library_scan)
>    return;
>
>is what is the reason we fail to reach pd_activate ().. Sorry for not being clear in my previous email..

OK, so this raises two questions:

- I think we never should call pd_activate twice on a process
  where pd_active is already true.  This can now happen when
  you call pd_activate from update_thread_list.

- Do we even need the in_initial_library_scan check at all
  anymore? I seem to recall we added that as the
  sync_threadlists call could cause confusion during early
  startup.  But now that we don't call sync_threadlists
  from pd_activate any more, maybe we can simply remove that
  check completely?  And then, maybe we no longer need to
  call pd_active from update_thread_list.

Minor issues I noticed in the patch:

- Variable "data" in update_thread_list looks unused?
- Some whitespace issues - please watch the TAB settings,
  in the GDB sources a TAB should be 8 spaces.

Bye,
Ulrich

[-- Attachment #2: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch --]
[-- Type: application/octet-stream, Size: 3079 bytes --]

From 2bc9251be7c7342ef9776581685a60387aee8e55 Mon Sep 17 00:00:00 2001
From: Aditya Vidyadhar Kamath <Aditya.Kamath1@ibm.com>
Date: Wed, 22 Nov 2023 10:13:35 -0600
Subject: [PATCH] Fix AIX thr!= NULL assertion failure during fork.

In AIX, while we followed the child process and detach on fork was on we hit thr!= NULL assertion failure.

The reason for the same was GDB core trying to switch to a child thread with tid not set that does not
exist, since child's ptid was changed to ptid_t (pid, 0, tid) in sync_threadlists() as it was threaded.

The way this happened was when a new child process is born, its object file will be loaded, calling the new_objfile ()
in aix-thread.c file from clone_program_space, which is
called from within follow_fork_inferior. Therefore it end ups syncing threadlists via pd_update ().

This patch is a fix for the same where pd_update () is called in the wait () or in update_thread_list() hook only.
---
 gdb/aix-thread.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index 945b7f6c697..c8147defc04 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -136,6 +136,8 @@ class aix_thread_target final : public target_ops
   const char *extra_thread_info (struct thread_info *) override;
 
   ptid_t get_ada_task_ptid (long lwp, ULONGEST thread) override;
+
+  void update_thread_list () override;
 };
 
 static aix_thread_target aix_thread_ops;
@@ -1015,7 +1017,7 @@ pd_update (pid_t pid)
    If successful and there exists and we can find an event thread, return a ptid
    for that thread.  Otherwise, return a ptid-only ptid using PID.  */
 
-static ptid_t
+static void
 pd_activate (pid_t pid)
 {
   int status;
@@ -1030,9 +1032,24 @@ pd_activate (pid_t pid)
       return ptid_t (pid);
     }
   data->pd_active = 1;
-  return pd_update (pid);
+  return;
 }
 
+/* AIX implementation of update_thread_list.  */
+
+void
+aix_thread_target::update_thread_list ()
+{
+  for (inferior *inf : all_inferiors ())
+    {
+      if (inf->pid == 0)
+	continue;
+
+      pd_update (inf->pid);
+    }
+}
+
+
 /* An object file has just been loaded.  Check whether the current
    application is pthreaded, and if so, prepare for thread debugging.  */
 
@@ -1077,11 +1094,6 @@ pd_enable (inferior *inf)
   current_inferior ()->push_target (&aix_thread_ops);
   data->pd_able = 1;
 
-  /* When attaching / handling fork child, don't try activating
-     thread debugging until we know about all shared libraries.  */
-  if (inf->in_initial_library_scan)
-    return;
-
   /* If we're debugging a core file or an attached inferior, the
      pthread library may already have been initialized, so try to
      activate thread debugging.  */
@@ -1216,7 +1228,7 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,
 
       if (regcache_read_pc (regcache)
 	  - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
-	return pd_activate (ptid.pid ());
+	pd_activate (ptid.pid ());
     }
 
   return pd_update (ptid.pid ());
-- 
2.41.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-22 16:22                 ` Aditya Kamath1
@ 2023-11-22 18:30                   ` Ulrich Weigand
  2023-11-23  6:06                     ` Aditya Kamath1
  0 siblings, 1 reply; 12+ messages in thread
From: Ulrich Weigand @ 2023-11-22 18:30 UTC (permalink / raw)
  To: gdb-patches, Aditya Kamath1; +Cc: Sangamesh Mallayya

Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>If this patch is okay kindly push it. If not let me know. 

This version LGTM, I've applied the patch now.

Thanks,
Ulrich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix AIX thread NULL assertion failure during fork
  2023-11-22 18:30                   ` Ulrich Weigand
@ 2023-11-23  6:06                     ` Aditya Kamath1
  0 siblings, 0 replies; 12+ messages in thread
From: Aditya Kamath1 @ 2023-11-23  6:06 UTC (permalink / raw)
  To: Ulrich Weigand, gdb-patches; +Cc: Sangamesh Mallayya

[-- Attachment #1: Type: text/plain, Size: 517 bytes --]

Thanks :)

From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Date: Thursday, 23 November 2023 at 12:00 AM
To: gdb-patches@sourceware.org <gdb-patches@sourceware.org>, Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>If this patch is okay kindly push it. If not let me know.

This version LGTM, I've applied the patch now.

Thanks,
Ulrich

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-23  6:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-20  7:25 [PATCH] Fix AIX thread NULL assertion failure during fork Aditya Kamath1
2023-11-20 11:27 ` Ulrich Weigand
2023-11-21  7:30   ` Aditya Kamath1
2023-11-21 12:17     ` Ulrich Weigand
2023-11-22 10:48       ` Aditya Kamath1
2023-11-22 11:30         ` Ulrich Weigand
2023-11-22 13:58           ` Aditya Kamath1
2023-11-22 14:14             ` Aditya Kamath1
2023-11-22 15:33               ` Ulrich Weigand
2023-11-22 16:22                 ` Aditya Kamath1
2023-11-22 18:30                   ` Ulrich Weigand
2023-11-23  6:06                     ` Aditya Kamath1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).