Respected Ulrich and community members, >>Yes. So we have a little more to understand here. So once pd_enable () was called and then pd_activate () >>was called it looks like either the libraries were not fully loaded though the pthdb_session_pthreaded was >>successful or the create thread event breakpoint was not successful. So the code execution did not reach >>pd_activate (). Therefore our child process was still ptid_t (pid, 0 ,0). When I used the pd_activate () inside >?>>the update_threadlist () it succeeded to initialise a session later on when it was called and then we were >>able to set pd_active and get to sync_threadlists (). Please see the outputs below for Program 1. We are >>awesome here. I can confirm that this condition /* When attaching / handling fork child, don't try activating thread debugging until we know about all shared libraries. */ if (inf->in_initial_library_scan) return; is what is the reason we fail to reach pd_activate ().. Sorry for not being clear in my previous email.. From: Aditya Kamath1 Date: Wednesday, 22 November 2023 at 7:28 PM To: Ulrich Weigand , gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork Respected Ulrich and community members, Hi, Please find attached the patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}. You are genius. Thank you for your inputs. I have a lot to learn from you. >I see one possible issue with the current patch - there is one >other existing caller to pd_activate in the ::wait routine: > if (regcache_read_pc (regcache) > - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr) > return pd_activate (ptid.pid ()); >This is the one place other than update_thread_list where you do >in fact need to call pd_update. However, now that you've changed >pd_activate to no longer call pd_update, in the case where the >code above executes "return pd_activate", that call is now missing. >I think a straightforward fix might be to simply remove the "return" >and just fall through into the pd_update after pd_activate returned. This is done. >Also, it looks like your're only updating the *current* inferior >in update_thread_list. Other targets update the thread lists of >*all* inferiors. E.g. have a look at the linux-thread-db.c >version of update_thread_list. Yeah, so from here on we are doing this. Please see in the patch. >Having said that we are missing some corner cases that I want to solve >in a separate patch or the same depending on whatever you suggest. Do the above change fix those corner cases? Yes. So we have a little more to understand here. So once pd_enable () was called and then pd_activate () was called it looks like either the libraries were not fully loaded though the pthdb_session_pthreaded was successful or the create thread event breakpoint was not successful. So the code execution did not reach pd_activate (). Therefore our child process was still ptid_t (pid, 0 ,0). When I used the pd_activate () inside the update_threadlist () it succeeded to initialise a session later on when it was called and then we were able to set pd_active and get to sync_threadlists (). Please see the outputs below for Program 1. We are awesome here. Kindly let me know if this patch is good. Kindly push if there are no changes needed. Let me also tell you why there are 2 failures out 32 as of now in foll-fork-print-inferiors.exp test case in which we are not able to see the message if child exits when we follow child. The code is like this. #include #include int main (int argc, char *argv[]) { pid_t child; child = fork (); switch (child) { case -1: abort (); case 0: default: break; } return 0; } So when pd_enable () is called and we succeed in pthdb_session_pthreaded (), the libraries are still not loaded or breakpoint is not se and we are unable to go to pd_activate() leaving our pd_active to be 0. This means pd_update () cannot go to sync_threadlists (). That is why the child is still ptid_t (pid, 0, 0), and by the time we could do something we receive a HUP signal and the code exits. That is the issue. Please see the output below. We see this only when we follow parent. When we follow parent all is well irrespective of detach-on-fork. Reading symbols from //gdb_tests/fork-print-inferior-events... (gdb) set follow-fork-mode child (gdb) r Starting program: /gdb_tests/fork-print-inferior-events [Attaching after process 7340304 fork to child process 24117514] [New inferior 2 (process 24117514)] [Detaching after fork from parent process 7340304] [Inferior 1 (process 7340304) detached] Thread 2.1 received signal SIGHUP, Hangup. [Switching to process 24117514] 0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o) (gdb) info threads Id Target Id Frame * 2.1 process 24117514 0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o) (gdb) q Have a nice day ahead. Thanks and regards, Aditya. Program 1:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t p; p = fork(); if(p<0) { perror("fork fail"); exit(1); } // child process because return value zero else if ( p == 0) printf("Hello from Child!\n"); // parent process because return value non-zero. else printf("Hello from Parent!\n"); while (1); /* break here */ } int main (void) { int i; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { pthread_t thread; int res; res = pthread_create (&thread, NULL, thread_function, NULL); assert (res == 0); } while (1) sleep (1); return 0; } Follow-fork-mode = parent, detach-on-fork = on Reading symbols from //gdb_tests/multi-thread-fork... (gdb) r Starting program: /gdb_tests/multi-thread-fork [New Thread 258] [New Thread 515] [Detaching after fork from child process 11338010] Hello from Child! Hello from Parent! [Detaching after fork from child process 16515516] Hello from Child! Hello from Parent! Thread 2 received signal SIGINT, Interrupt. [Switching to Thread 258] thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50 50 while (1); /* break here */ (gdb) info threads Id Target Id Frame 1 Thread 1 (tid 8853293, running) warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) * 2 Thread 258 (tid 10098629, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at //gdb_tests/multi-thread-fork.c:50 3 Thread 515 (tid 9443313, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at //gdb_tests/multi-thread-fork.c:50 (gdb) Follow-fork-mode = child, detach-on-fork = on Reading symbols from //gdb_tests/multi-thread-fork... (gdb) set follow-fork-mode child (gdb) r Starting program: /gdb_tests/multi-thread-fork [New Thread 258] [New Thread 515] [Attaching after Thread 515 fork to child process 16515518] [New inferior 2 (process 16515518)] [Detaching after fork from parent process 7340290] [Inferior 1 (process 7340290) detached] Hello from Parent! Hello from Child! Hello from Parent! Hello from Child! Thread 2.1 received signal SIGINT, Interrupt. [Switching to Thread 515] thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50 50 while (1); /* break here */ (gdb) info threads Id Target Id Frame * 2.1 Thread 515 (tid 76879955, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at //gdb_tests/multi-thread-fork.c:50 (gdb) Follow-fork-mode = parent, detach-on-fork = off Reading symbols from //gdb_tests/multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /gdb_tests/multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (process 24117510)] Hello from Parent! [New inferior 3 (process 11338014)] Hello from Parent! Thread 1.1 received signal SIGINT, Interrupt. 0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o) (gdb) info threads Id Target Id Frame * 1.1 Thread 1 (tid 7870387, running) 0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o) 1.2 Thread 258 (tid 10098633, running) thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50 1.3 Thread 515 (tid 9443317, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at //gdb_tests/multi-thread-fork.c:50 2.1 Thread 515 (tid 8132439, running) 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o) 3.1 Thread 258 (tid 77533225, running) 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o) (gdb) Follow-fork-mode = child, detach-on-fork = off Reading symbols from //gdb_tests/multi-thread-fork... (gdb) set detach-on-fork off (gdb) set follow-fork-mode child (gdb) r Starting program: /gdb_tests/multi-thread-fork [New Thread 258] [New Thread 515] [Attaching after Thread 515 fork to child process 11338016] [New inferior 2 (process 11338016)] Hello from Child! Thread 2.1 received signal SIGINT, Interrupt. [Switching to Thread 515] thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50 50 while (1); /* break here */ (gdb) info threads Id Target Id Frame 1.1 Thread 1 (tid 113118883, running) warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 1.2 Thread 258 (tid 9115441, sleeping) warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 1.3 Thread 515 (tid 7870389, running) 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o) * 2.1 Thread 515 (tid 9443319, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at //gdb_tests/multi-thread-fork.c:50 (gdb) From: Ulrich Weigand Date: Wednesday, 22 November 2023 at 5:00 PM To: gdb-patches@sourceware.org , Aditya Kamath1 Cc: Sangamesh Mallayya Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork Aditya Kamath1 wrote: >So this was the issue. Thank you for suggesting this. I appreciate your knowledge >to tell us this. Please see Program 1 and the GDB outputs pasted below this email >for set detach-on-fork on/off and set follow-fork-mode=parent/child. This works >in this test case and it looks like many other test cases as well are fixed from >the GDB.base and GDB.threads test suite numbers mentioned below in a section. Excellent, that's good to hear! >Kindly let me know if this is okay. I see one possible issue with the current patch - there is one other existing caller to pd_activate in the ::wait routine: if (regcache_read_pc (regcache) - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr) return pd_activate (ptid.pid ()); } return pd_update (ptid.pid ()); This is the one place other than update_thread_list where you do in fact need to call pd_update. However, now that you've changed pd_activate to no longer call pd_update, in the case where the code above executes "return pd_activate", that call is now missing. I think a straightforward fix might be to simply remove the "return" and just fall through into the pd_update after pd_activate returned. (Note that the return value of pd_activate should then no longer be used anywhere and can be removed.) Also, it looks like your're only updating the *current* inferior in update_thread_list. Other targets update the thread lists of *all* inferiors. E.g. have a look at the linux-thread-db.c version of update_thread_list. >Having said that we are missing some corner cases that I want to solve >in a separate patch or the same depending on whatever you suggest. Do the above change fix those corner cases? Bye, Ulrich