Hi Ulrich, Tom and community. Please find attached the patch. {See: 0001-Fix-multi-thread-debug-bug-in-AIX.patch} >Not sure if I follow you exactly here, but my understanding is >indeed that follow_fork should initially create an inferior for >the new process with just a single main thread using (pid, 0, 0). >Subsequently, aix-thread should detect that the new inferior >uses pthreads and then switch its ptid to (pid, 0, tid). >Not sure where exactly this goes wrong for you. What is the >path leading to this crash you're observing? Let me explain this using the example code pasted below this email. Consider we now follow the child {which is the only case, previous version of the patch was failing}. --------------------------------------------------------------------- >(gdb) set follow-fork-mode child >(gdb) set detach-on-fork off >(gdb) r >Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork >[New Thread 258] >[New Thread 515] >[Attaching after Thread 515 fork to child process 10748216] >[New inferior 2 (process 10748216)] --------------------------------------------------------------------- GDB until here know that a fork event occurred and the child process is <10748216, 0, 0> via wait () in the rs6000-aix-nat.c file. The child in the GDB core is . GDB has three things to do here. ** It will get a new object file notifier of the new process, ** new inferior is created so the new inferior notifier {Kindly See initialize_aix_thread () in aix-thread.c where aix_thread_inferior_created () will be called } ** And follow_fork () event since GDB must decide it has to follow the child. Note:- Our child process is This is the exact order in which the above 3 things will be executed from the main GDB event loop. So, the first task gets done. The new object file notifier has called pd_enable (), to pd_activate () to sync_threadlist () where we will change child process to if the libpthdebug session is initialised successfully. The second task now, new inferior notifier calls aix_thread_inferior_created () to pd_enable () (), to pd_activate () to sync_threadlist (). Now we have changed the child process from to . But no one has informed the GDB core about this change. It is still of the view that child process is from the set_forked () status it got from rs6000-aix-nat wait () while a fork () event was being detected. So when it attempts to handle the third task which is to follow the child process since we have given that command, the child ptid passed to switch_to_thread is . But this does not exists. Since the first two events changed it. So GDB got surprised on not finding it and failed to debug further thereby giving an option to dump the core with an assertion below. {Kindly see the back trace I pasted in the previous email. One can verify this}. ----------------------------------------------------------------------- >thread.c:1337: internal-error: switch_to_thread: Assertion `thr != NULL' ?>failed. >A problem internal to GDB has been detected, >further debugging may prove unreliable. ---------------------------------------------------------------------------- The game is about when we decide to set the pd_active or say thread debugging active. We did it in two places. In the wait () and pd_enable () in aix-thread.c file. We need it to put it in the correct place which we aren’t correctly. If there is a need for any non-target function like follow_fork to use ptid_t (pid, 0 , 0) then we must not change the thread ptid though the library is initialised. This is where we are going wrong. Let me also tell you all what I tried today and failed. I had removed pd_activate () from the pd_enable () and called pd_activate () only from wait () for all the events. {Currently we do only for trap}. But the problem is if we have many inferiors and couple of them with no private data set, since we didn’t in pd_enable () via pd_activate () as we removed it, then at any point if we iterate_over_threads () over these inferiors we will crash since these don’t have private data set yet. So this is a tricky problem. Wonder how Linux folks did it. If anyone knows it will be of help if shared. Kindly let me know at high level. I could not understand how they are handling the same. >But you can determine >the inferior associated with a PID directly via find_inferior_pid >without ever involving any thread. This is done in this patch. Thank you Ulrich. Let me know what you think. Awaiting for a reply. Have a nice day ahead. Thanks and regards, Aditya. ------------------------------------------------------------------- Code Code :- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else { child = fork (); if (child > 0) printf ("I am child \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } From: Ulrich Weigand Date: Friday, 17 February 2023 at 1:16 AM To: simark@simark.ca , Aditya Kamath1 , gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Aditya Kamath1 wrote: >So what is happening is that the when after a new process is born, >its pthread library is getting intialised and we have changed its >ptid from ptid (pid, 0, 0) to ptid (pid, 0, tid). Since we follow >fork the code in inferior.c file will switch to the thread child >where the child is reported as ptid (pid, 0, 0) but exists as >ptid (pid, 0, tid). This leads to this crash. We did try with two >variables if you recall in the previous patch. But your point of >pd_active being there for it was valid. So somehow something isn't >correct that I did not understand. We have pd_activate () in only >two places. So is follow_fork () is expecting us to switch to child >process and then change the ptid of the child?? If yes, how do we go?? >And if not where are we going wrong here. Not sure if I follow you exactly here, but my understanding is indeed that follow_fork should initially create an inferior for the new process with just a single main thread using (pid, 0, 0). Subsequently, aix-thread should detect that the new inferior uses pthreads and then switch its ptid to (pid, 0, tid). Not sure where exactly this goes wrong for you. What is the path leading to this crash you're observing? >Also this ptid_t (pid, 0, 0) and our main thread being >ptid_t (pid, 0, tid) might need a smarted way to switch to the >main thread's process space and set the right current inferior >process in pdc_read_memory. Kindly check it in this patch and >let me know if we can do it better. So you currently do: + thread_info *thread = find_thread_ptid (current_inferior (), + ptid_t (user_current_pid)); + /* If the pthread debug library is loaded, then we need the ptid_t (pid, 0 ,tid). + Since the main thread in the below for loop will be in the first iteration + we will break. */ + if (!thread) + { + for (thread_info *tp: all_threads (current_inferior ()->process_target (), + ptid_t (user_current_pid))) + { + thread = tp; + break; + } + } [...] + { + scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid); + inferior_ptid = ptid_t (user_current_pid); + scoped_restore_current_inferior restore_inferior; + set_current_inferior (thread->inf); + + scoped_restore_current_program_space restore_current_progspace; + set_current_program_space (thread->inf->pspace); + status = target_write_memory (addr, (gdb_byte *) buf, len); + } This is overkill. Note that at no point do you actually ever use "thread" itself, only "thread->inf". But you can determine the inferior associated with a PID directly via find_inferior_pid without ever involving any thread. Bye, Ulrich