Hi Ulrich and community, >So here's the piece I do not fully understand: Yes, wait () inside rs6000-aix-nat.c >will set the ourstatus child_ptid to a non-threaded ptid. But that's for a newly >created child process that should not *be* threaded at this time, right? Yes. So till here it is correct and we are on the same page here. The child ptid is ptid_t (pid, 0, 0). Here child should not be threaded. And therefore we have registered to the GDB core that the parent has a non-threaded child. >So how is it possible that in between wait () setting the child_ptid and infrun.c >using it to switch to the child, the child is becoming multi-threaded? Where is >the sync_threadlists () call that makes this happen? >I think we should understand better how this could have happened. I’m sorry I missed an information to tell you. So the parent process is loaded it is multi-threaded, child is loaded and through wait we have informed that fork () event has happened and given the GDB core its required information. This child now will have its object file which will be loaded soon. So new_objfile () is called which will inturn call pd_enable () and this function will call pd_activate () then pd_update (), then sync_threadlists (). Once it is in sync_threadlists () its ptid will get synced to ptid_t (pid, 0, utid) since cmp_result will be positive, pbuf has a user thread ID but gbuf does not and has a ptid which is non threaded process. This where the mess happens and we end up changing the ptid via thread_change_ptid (). After this we know that child has threaded ptid but GDB core is still using ptid_t (pid, 0, 0).. Perhaps GDB core will update this ptid later. I am not sure of that. But yes, we need to stop pd_activate () from syncing threadlists when the call is made for a child process whose object file is just loaded and GDB core is yet to switch to this thread post detaching the parent process since the user has set his debugging options like that. If we recall we check this inf->in_initial_library_scan. But in this case, this flag is not able to stop this bug from happening. That is why in my patch sent in the previous email I was checking that is there is only one thread that a process has then do not change the ptid to a threaded one. So yeah this is the thought process. Let me know what you think. I am pasting the output where I have print in pd_update () and pd_enable (). We can clearly see why this is happening. Hope it helps. Have a nice day ahead. Thanks and regards, Aditya. ------------------------------------------------------- Reading symbols from //gdb_tests/multi-thread-fork... (gdb) set follow-fork-mode child (gdb) r Starting program: /gdb_tests/multi-thread-fork pd_update pid = 9044280 pid in sync_threadlists () is 9044280 pd_update pid = 9044280 pid in sync_threadlists () is 9044280 pd_update pid = 9044280 pid in sync_threadlists () is 9044280 [New Thread 258] [New Thread 515] [Attaching after Thread 515 fork to child process 13763052] [New inferior 2 (process 13763052)] [Detaching after fork from parent process 9044280] Hello from Parent! [Inferior 1 (process 9044280) detached] Hello from Child! Hello from Parent! In pd_enable with pid 13763052 pd_update pid = 13763052 pid in sync_threadlists () is 13763052 thread.c:1385: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. From: Ulrich Weigand Date: Monday, 20 November 2023 at 4:57 PM To: gdb-patches@sourceware.org , Aditya Kamath1 Cc: Sangamesh Mallayya Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork Aditya Kamath1 wrote: >Assume we have set detach_on_fork = on and set follow-fork-mode child. >In AIX, on a fork () event we set our status and return our parent ptid from rs6000-aix-nat.c.. >Once the object file of the new_inferior or child process is loaded we call pd_enable () to >set our thread target and sync our threadlists. In our sync_threadlists we have pbuf having >our pthread library threads and gbuf having our GDB threads known to GDB core that have been >registered. >While I cannot say with 100% surety that from where GDB core got this ptid and why it did not >update to ptid_t (pid, 0, tid) , my observation post debugging is that GDB core would have got >ptid_t(pid, 0, 0) from the rs6000-aix-nat.c file, inside the wait () where we did inform GDB >by setting a status that this a child process belonging a parent process on a fork event. >GDB could not change this ptid it got during the fork event, even though we changed it later >via sync_threadlists () from aix-thread.c for the threaded event. So here's the piece I do not fully understand: Yes, wait () inside rs6000-aix-nat.c will set the ourstatus child_ptid to a non-threaded ptid. But that's for a newly created child process that should not *be* threaded at this time, right? So how is it possible that in between wait () setting the child_ptid and infrun.c using it to switch to the child, the child is becoming multi-threaded? Where is the sync_threadlists () call that makes this happen? I'm aware that the wait () in aix-thread.c (which is the caller of the rs6000-aix-nat.c one) does perform a pd_enable / sync_threadlists, but only on the *parent*, not on the child. That should happen only later. I think we should understand better how this could have happened. If there is a good reason why the child can already be multi-threaded, then one option to fix this would be to switch ourstatus->child_ptid to the multi- threaded version in the *aix-thread.c* version of wait (), just like it switches the returned ptid. Bye, Ulrich