Hi Ulrich and community, Please find the new patch [See:- 0001-Fix-multi-thread-bug-in-AIX.patch ]. I understood your previous email and what you are saying is correct. If we fix this top target, we can leave the process layer undisturbed. Having said that, I have a few obstacles I am facing in order to achieve the same. Kindly not all outputs I paste in this mail are generated with "set debug aix-thread" command and "set detach-on-fork off" command. We first try to get the symbol name where we need to attach a trap so that debugger can get notified that Hey you need to catch an event. This will be a thread create event in our case will be caught in a condition in aix-thread::wait layer and then we call pd_update () to sync_threadlists () to catch it. For this to happen the main thing is the symbol called "n_pthreads" needs to have an address in the symbol table. This symbol is checked when a new object file is generated via the pd​​_enable () where we use pthdb_session​_pthreaded () to check the same. If we are successful, we get into pd_update () and do our stuff plus push the top target as aix-thread.c.. So, all this happens correctly for program 1's parent {code attached below} as shown in the output below. Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork pdc_symbol_addrs (user_current_pid = 17957132, symbols = 0xfffffffffffdbc8, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE pdc_symbol_addrs (user_current_pid = 17957132, symbols = 0xfffffffffffdbc8, count = 1) symbols[0].name = "__n_pthreads" symbols[0].addr = 0xf0807334 returning PDC_SUCCESS pdc_read_data (user_current_pid = 17957132, buf = 0xfffffffffffdbc0, addr = 0xf0807334, len = 4) status=0, returning SUCCESS pdc_symbol_addrs (user_current_pid = 17957132, symbols = 0xfffffffffff88c8, count = 1) symbols[0].name = "__n_pthreads" symbols[0].addr = 0xf0807334 So, after this the first inferior works fine. When the second or the third inferior comes into picture from the new objfile () we to go pd_enable () then to pthdb_session​_pthreaded () .. Here we fail for the new inferior as shown the output below. [New Thread 258] [New Thread 515] fetch_regs_kernel_thread tid=225018d regno=64 arch64=0 [New inferior 2 (process 8061286)] pdc_free (user_current_pid = 17957132, buf = 0x11016f370) pdc_free (user_current_pid = 17957132, buf = 0x11016f3b0) pdc_free (user_current_pid = 17957132, buf = 0x11016f4f0) pdc_free (user_current_pid = 17957132, buf = 0x1104e3a70) pdc_free (user_current_pid = 17957132, buf = 0x1108af0d0) pdc_symbol_addrs (user_current_pid = 17957132, symbols = 0xfffffffffffdef8, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE pdc_symbol_addrs (user_current_pid = 8061286, symbols = 0xfffffffffffe248, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE I am parent [New process 17957132] [New inferior 3 (process 17433000)] pdc_symbol_addrs (user_current_pid = 17957132, symbols = 0xfffffffffffdef8, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE Since it could not read the symbol "_n_pthreads" it failed, and we could not set our top target for the new process as threads. So, I could not find why this happens. Because if the parent is pthreaded so will be the child as everything of the parent must be copied to the child. So, I should get my child also as pthreaded and "_n_pthread" symbol set to the address of the child's threads in the child process. Thus, our top target remained as process layer. In target.c when our event is going to wait, our current inferior is the child, and its top target is process layer. In the process layer though it recognised the process correctly since our parent is threaded, we do not have ptid_t (pid) for it. Hence the line [New process 17957132] appeared in the output. I did try doing searching in xcoffread.c but I felt I was in the wrong place searching for things which pthread debug library should define for us. This is where I need guidance. Your help can be useful to solve this problem for AIX and the GDB community. Kindly guide me with your expertise and let me know what you think. I have given all the information possible of my understanding till here. Let me know if you need more information to guide me. Waiting for a reply soon. Have a nice day ahead. Regards, Aditya. ----------------------------------- PROGRAM 1 #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * dummy_thread_function (void *arg) { printf ("Bye from dummy thread \n"); } void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else{ printf (" Iam child \n"); child = fork (); if (child > 0) printf ("From child I became a parent \n"); else { printf ("I am grandchild \n"); pthread_t thread; pthread_create (&thread, NULL, dummy_thread_function, NULL); } } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } ________________________________ From: Ulrich Weigand Sent: 08 December 2022 21:59 To: simark@simark.ca ; Aditya Kamath1 ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Aditya Kamath1 wrote: >>So this last bit seems to be the problem. Could you elaborate on >>what the exact call stack is? I thought once the thread layer is >>initialized, calls to ::wait should always go through it ... > >Kindly see the backtrace sections >BT:- Thread_wait [which is on a thread event like new thread born or main process is pthreaded], >BT:- Post thread wait in rs6000-aix-nat::wait [which is the beneath ()->wait () in aix_thread_target::wait], >BT:- If direct rs6000-aix-nat::wait [ where in output 3 and 4 {below in this email} you can see it will directly come to rs6000-aix-nat.c if the main process after having threads forks or uses a fork () call ] pasted below in this email. I'm only replying to this is right now, because that seems to be the fundamental problem that ultimately causes a lot of the other issues you're seeing. It seems the core problem is that you're not initializing the thread layer correctly for any but the first inferior! So all other inferiors started with fork are assumed to be single- threaded ... If you look at a backtrace like this: >BT:- If direct rs6000-aix-nat::wait > >Thread 1 hit Breakpoint 2, rs6000_nat_target::wait (this=0x1100a2e10 <_rs6000aixnat.rw_>, ptid=..., > ourstatus=0xffffffffffff360, options=...) at rs6000-aix-nat.c:695 >695 set_sigint_trap (); >(gdb) bt >#0 rs6000_nat_target::wait (this=0x1100a2e10 <_rs6000aixnat.rw_>, ptid=..., > ourstatus=0xffffffffffff360, options=...) at rs6000-aix-nat.c:695 >#1 0x0000000100340778 in target_wait (ptid=..., status=0xffffffffffff360, options=...) at target.c:2598 you see that the target.c code uses the current inferior's "top_target" to find the appropriate target routines: target_ops *target = current_inferior ()->top_target (); [...] ptid_t event_ptid = target->wait (ptid, status, options); For a multi-threaded process "top_target" *should* point to aix_thread_ops, which is achieved by this call in pd_enable: current_inferior ()->push_target (&aix_thread_ops); However, note that this is applied only to *one* inferior. You actually need to do this for *all* new inferiors as soon as they are detected to become multi-threaded. This does not happen because aix-thread.c currently has a static global pd_able variable that applies to GDB as a whole. Back in the days where this was introduced, that was probably correct since a single GDB session could only debug one single inferior back then. But for multiple inferiors, any of which can be multi-threaded, this does not work. I think you should first of all work on fixing this, and then go back to validating your test scenarios without any of the other changes - many of those likely will no longer be necessary then. Bye, Ulrich