Hi Ulrich and community, Please find attached the patch. [See: 0001-Fix-multi-thread-debug-bug-in-AIX.patch] >I think you'll have to allow for that modified form of the name as well. I have allowed the same. Please see the solib-aix change. With this we are able to read all the symbols in any inferior successfully. One can verify this by using the set debug aix-thread. If one executes a "info shared library" command, one can see the 4 libraries for any inferior. Kindly check output 1,pasted below in this email for program 1. >>Even if I allow a pathname match to the member_name we end up losing all the >>information of our threads in the first process though we still have the >>process information. >This needs further debugging to understand what's going on once you allow >that match. That original problem should be fixed by that change, so >there's probably something else as well ... Yeah. As mentioned in my previous mail we are losing our threads information. Though in the output we do get a new thread my attempt to understand the root cause in the code failed miserably. Thread 1 belonging to process 1 is getting shown as 2.1 in output 2 pasted below. What's worse is the top target is also not setting properly in the process of having the right name for the shared library. I have the correct program space as well while reading the symbol. And since the top target is wrong, the new process appears though this is a threaded one in the output. So, looking at this, I have missed out something may be minor or major causing the bug which I am unaware of in the code base. I have tried debugging the aix-thread.c.. But things look to be properly aligned as it should be there at least. Single inferior examples with multiple threads pass. But multi inferior with multi threads fail. Kindly guide me on what I am missing here. It is surely something which I have not explored and unaware of. Your expertise can help us resolve this bug. Thank you for the guidance so far in this bug. Waiting for a reply soon. Have a nice day ahead. Thanks and regards, Aditya. --------------------------------------------------------------------------- Program 1:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 1 void * thread_function (void *arg) { pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else { child = fork (); if (child > 0) printf ("I am child \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } --------------------------------------------------------------------------------------------------------- Output 1:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New inferior 2 (process 15925744)] I am parent ^C[New process 11665696] Thread 1.3 received signal SIGINT, Interrupt. 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info sharedlibrary From To Syms Read Shared Object Library 0xd05bc124 0xd05bf194 Yes (*) /usr/lib/libpthreads.a(shr_comm.o) 0xd05bb240 0xd05bb9a1 Yes (*) /usr/lib/libcrypt.a(shr.o) 0xd0576180 0xd05ba731 Yes (*) /usr/lib/libpthread.a(shr_xpg5.o) 0xd0100e00 0xd0575123 Yes (*) /usr/lib/libc.a(shr.o) (*): Shared library is missing debugging information. (gdb) inferior 2 [Switching to inferior 2 [process 15925744] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (Thread 258)] #0 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info sharedlibrary From To Syms Read Shared Object Library 0xd05bc124 0xd05bf194 Yes (*) /usr/lib/libpthreads.a(shr_comm.o) 0xd05bb240 0xd05bb9a1 Yes (*) /usr/lib/libcrypt.a(shr.o) 0xd0576180 0xd05ba731 Yes (*) /usr/lib/libpthread.a(shr_xpg5.o) 0xd0100e00 0xd0575123 Yes (*) /usr/lib/libc.a(shr.o) (*): Shared library is missing debugging information. (gdb) --------------------------------------------------------------------------------------------------------- Output 2:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New inferior 2 (process 16122342)] I am parent ^C[New process 11665700] Thread 1.3 received signal SIGINT, Interrupt. 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info threads Id Target Id Frame * 1.3 process 11665700 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 2.1 Thread 258 (tid 28115287, running) 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) ________________________________ From: Ulrich Weigand Sent: 09 January 2023 19:34 To: simark@simark.ca ; Aditya Kamath1 ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Aditya Kamath1 wrote: >Here I have added a print statement to ensure we are able to find the member in the archive. > >What's interesting is for the first inferior this works fine for all shared libraries. >For the second one and every inferior thereafter the output is as shown below in the next paragraph, >object_bfd is shr.o --- compared with --- member_name is shr_comm.o in path /usr/lib/libpthreads.a(shr_comm.o) >object_bfd is /usr/lib/libpthreads.a(shr_comm.o) --- compared with --- member_name is shr_comm.o >in path /usr/lib/libpthreads.a(shr_comm.o) >I was surprised that the bfd_get_filename (object_bfd.get ()) is returning the pathname >instead of the object file descriptor. Everything until here seems to correct in the >solib_aix_bfd_open () function and this makes it hard for me to understand what is going on. Looks like this is because solib_aix_bfd_open *changes* the BFD filename here: /* Override the returned bfd's name with the name returned from solib_find along with appended parenthesized member name in order to allow commands listing all shared libraries to display. Otherwise, we would only be displaying the name of the archive member object. */ std::string fname = string_printf ("%s%s", bfd_get_filename (archive_bfd.get ()), sep); bfd_set_filename (object_bfd.get (), fname.c_str ()); so when the same BFD gets checked a second time, you'll now see the changed filename instead of the original one. I think you'll have to allow for that modified form of the name as well. >Even if I allow a pathname match to the member_name we end up losing all the >information of our threads in the first process though we still have the >process information. This needs further debugging to understand what's going on once you allow that match. That original problem should be fixed by that change, so there's probably something else as well ... Bye, Ulrich