Respected Ulrich and community members, Thank you for your reply. >Ah, I see. new_objfile is called from clone_program_space, which is >called from within follow_fork_inferior. If this changes the child's >main thread ptid under the covers, then any future use of child_ptid >in follow_fork_inferior / follow_fork will be incorrect. >Instead, there is a new ::update_thread_list target hook that >common code calls at appropriate times (when it expects ptids >to change). I think you need to look into removing the call to >sync_threadlists from new_objfile and instead implement the >update_thread_list hook and perform that call there. >I believe you still need to perform the pd_enable / pd_activate >tasks during new_objfile, but no longer automatically call >pd_update. The latter should be moved to update_thread_list. Please find attached the patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}. So this was the issue. Thank you for suggesting this. I appreciate your knowledge to tell us this. Please see Program 1 and the GDB outputs pasted below this email for set detach-on-fork on/off and set follow-fork-mode=parent/child. This works in this test case and it looks like many other test cases as well are fixed from the GDB.base and GDB.threads test suite numbers mentioned below in a section. Kindly let me know if this is okay. Having said that we are missing some corner cases that I want to solve in a separate patch or the same depending on whatever you suggest. I get why we are moving pd_update () to update_thread_list () as it will take care of syncing thread lists whenever required. Let us take one classic case of fork-print-inferior-event test case. The code looks like #include #include int main (int argc, char *argv[]) { pid_t child; child = fork (); switch (child) { case -1: abort (); case 0: default: break; } return 0; } Here if we observe the difference, Linux clearly shows inferior2 the child exited. In Linux we get :- gdb) set follow-fork-mode child (gdb) r Starting program: /home/a.out [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Attaching after Thread 0x7ffff7d85740 (LWP 57) fork to child process 58] [New inferior 2 (process 58)] [Detaching after fork from parent process 57] [Inferior 1 (process 57) detached] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Inferior 2 (process 58) exited normally] } In AIX we get:- Reading symbols from //gdb_tests/fork-print-inferior-events... (gdb) set follow-fork-mode child (gdb) r Starting program: /gdb_tests/fork-print-inferior-events [Attaching after process 22151436 fork to child process 23527898] [New inferior 2 (process 23527898)] [Detaching after fork from parent process 22151436] [Inferior 1 (process 22151436) detached] Thread 2.1 received signal SIGHUP, Hangup. [Switching to process 23527898] 0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o) (gdb) ------------------------------------------------------------------------------------------------------------------------------------- But in AIX we are not able to catch this event currently. That is why this condition was added. + if (inferior_ptid.pid ()) If we do not add this condition, then get_thread_data_helper_for_ptid () will crash as pid is 0 with a pid!= NULL assertion since child exits. Also, if we check the output of Program 1 below and the case where we follow-parent and detach-on-fork is off, the child processes are shown as process and not threads. But in Linux it is shown as threads when we use info threads command. Only these two corner cases we have problems in AIX. Kindly let me know where I went wrong in catching the exit event and the new child being converted to thread ptid event so that we can solve this for AIX and GDB. Somewhere my debug analysis went wrong while solving this issue. It would be of helpful to have better user experience though with the current patch , debugging works. Have a nice day ahead. Thanks and regards, Aditya. The test case numbers :- before this patch GDB.base === gdb Summary === # of expected passes 28377 # of unexpected failures 2163 # of expected failures 15 # of known failures 10 # of unresolved testcases 154 # of untested testcases 79 # of unsupported tests 117 # of paths in test names 5 # of duplicate test names 4 After this patch GDB.base === gdb Summary === # of expected passes 28588 # of unexpected failures 2191 # of unexpected successes 1 # of expected failures 16 # of known failures 9 # of unresolved testcases 138 # of untested testcases 79 # of unsupported tests 116 # of paths in test names 5 # of duplicate test names 2 Before this patch GDB.threads # of expected passes 1790 # of unexpected failures 610 # of expected failures 4 # of known failures 21 # of unresolved testcases 10 # of untested testcases 9 # of unsupported tests 82 # of paths in test names 3 After this patch GDB.thread === gdb Summary === # of expected passes 1871 # of unexpected failures 630 # of expected failures 4 # of known failures 21 # of unresolved testcases 2 # of untested testcases 9 # of unsupported tests 81 # of paths in test names 3 ------------------------------------------------------------------------------------------------------------------------- Program 1:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t p; p = fork(); if(p<0) { perror("fork fail"); exit(1); } // child process because return value zero else if ( p == 0) printf("Hello from Child!\n"); // parent process because return value non-zero. else printf("Hello from Parent!\n"); while (1); /* break here */ } int main (void) { int i; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { pthread_t thread; int res; res = pthread_create (&thread, NULL, thread_function, NULL); assert (res == 0); } while (1) sleep (1); return 0; } Follow-fork = parent detach-on-fork = on Reading symbols from //gdb_tests/multi-thread-fork... (gdb) r Starting program: /gdb_tests/multi-thread-fork [New Thread 258] [New Thread 515] [Detaching after fork from child process 15663474] Hello from Child! [Detaching after fork from child process 15532296] Hello from Child! Hello from Parent! Hello from Parent! Thread 1 received signal SIGINT, Interrupt. 0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o) (gdb) Follow-fork = child detach-on-fork = parent eading symbols from //gdb_tests/multi-thread-fork... (gdb) set follow-fork-mode child (gdb) r Starting program: /gdb_tests/multi-thread-fork [New Thread 258] [New Thread 515] [Attaching after Thread 515 fork to child process 15532298] [New inferior 2 (process 15532298)] [Detaching after fork from parent process 22151424] [Inferior 1 (process 22151424) detached] Hello from Parent! Hello from Parent! Hello from Child! Hello from Child! Thread 2.1 received signal SIGINT, Interrupt. [Switching to Thread 515] thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50 50 while (1); /* break here */ (gdb) Follow-fork = parent detach-on-fork = parent Reading symbols from //gdb_tests/multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /gdb_tests/multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (process 23527894)] [New inferior 3 (process 15663478)] Hello from Parent! Hello from Parent! Thread 1.1 received signal SIGINT, Interrupt. 0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o) (gdb) info threads Id Target Id Frame * 1.1 Thread 1 (tid 30744339, running) 0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o) 1.2 Thread 258 (tid 39264247, running) thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50 1.3 Thread 515 (tid 8984493, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at //gdb_tests/multi-thread-fork.c:50 2.1 process 23527894 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o) 3.1 process 15663478 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o) (gdb) Follow-fork = child detach-on-fork = parent Reading symbols from //gdb_tests/multi-thread-fork... (gdb) set follow-fork-mode child (gdb) set detach-on-fork off (gdb) r Starting program: /gdb_tests/multi-thread-fork [New Thread 258] [New Thread 515] [Attaching after Thread 515 fork to child process 15663480] [New inferior 2 (process 15663480)] Hello from Child! Thread 2.1 received signal SIGINT, Interrupt. [Switching to process 15663480] thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50 50 while (1); /* break here */ (gdb) From: Ulrich Weigand Date: Tuesday, 21 November 2023 at 5:47 PM To: gdb-patches@sourceware.org , Aditya Kamath1 Cc: Sangamesh Mallayya Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork Aditya Kamath1 wrote: >>So how is it possible that in between wait () setting the child_ptid and infrun.c >>using it to switch to the child, the child is becoming multi-threaded? Where is >>the sync_threadlists () call that makes this happen? > >>I think we should understand better how this could have happened. > >I’m sorry I missed an information to tell you. So the parent process is loaded it is >multi-threaded, child is loaded and through wait we have informed that fork () event >has happened and given the GDB core its required information. > >This child now will have its object file which will be loaded soon. So new_objfile () >is called which will inturn call pd_enable () and this function will call pd_activate () >then pd_update (), then sync_threadlists (). Ah, I see. new_objfile is called from clone_program_space, which is called from within follow_fork_inferior. If this changes the child's main thread ptid under the covers, then any future use of child_ptid in follow_fork_inferior / follow_fork will be incorrect. Now, the question is whether new_objfile should actually do that. I believe that in current GDB, the answer is actually no. While in the past, platform-specific thread targets were indeed supposed to do that, it seems this has been changed a while ago. Today, none of the other thread targets do that, as far as I can see. Instead, there is a new ::update_thread_list target hook that common code calls at appropriate times (when it expects ptids to change). I think you need to look into removing the call to sync_threadlists from new_objfile and instead implement the update_thread_list hook and perform that call there. I believe you still need to perform the pd_enable / pd_activate tasks during new_objfile, but no longer automatically call pd_update. The latter should be moved to update_thread_list. Bye, Ulrich