Hi Ulrich and community, Please find attached the patch {See: 0001-Fix-multi-thread-debug-bug-in-AIX.patch} >I think the problem here may be that the lookup_minimal_symbol >call in pdc_symbol_addrs has to be performed in the correct >process address space. This wasn't an issue before since the >routine was only called for the first process anyway. >Look at the equivalent routine on Linux, which is >ps_pglobal_lookup in proc-service.c. This does: >inferior *inf = ph->thread->inf; > scoped_restore_current_program_space restore_pspace; > set_current_program_space (inf->pspace); > You'll need to do the equivalent (set the program space > to the one appropriate for the inferior referenced by > user_current_pid). So, I tried setting this right in this patch and it did not work. I even tried setting the inferior_ptid and current_inferior to user_current_pid, but that also did not help. The output still appears as follows for the same code in the previous mail. [New Thread 258] [New Thread 515] fetch_regs_kernel_thread tid=1da0189 regno=64 arch64=0 [New inferior 2 (process 6553888)] pdc_free (user_current_pid = 11272598, buf = 0x11016ee70) pdc_free (user_current_pid = 11272598, buf = 0x11016eeb0) pdc_free (user_current_pid = 11272598, buf = 0x11016eff0) pdc_free (user_current_pid = 11272598, buf = 0x1104e2530) pdc_free (user_current_pid = 11272598, buf = 0x1108af2d0) pdc_symbol_addrs (user_current_pid = 11272598, symbols = 0xfffffffffffdf08, count = 1) symbols[0].name = "__n_pthreads" symbols[0].addr = 0xf0807334 returning PDC_SUCCESS pdc_read_data (user_current_pid = 11272598, buf = 0xfffffffffffdf00, addr = 0xf0807334, len = 4) status=0, returning SUCCESS pdc_symbol_addrs (user_current_pid = 6553888, symbols = 0xfffffffffffe258, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE [New process 11272598] [New inferior 3 (process 8323536)] pdc_symbol_addrs (user_current_pid = 11272598, symbols = 0xfffffffffffdf08, count = 1) symbols[0].name = "__n_pthreads" symbols[0].addr = 0xf0807334 returning PDC_SUCCESS pdc_read_data (user_current_pid = 11272598, buf = 0xfffffffffffdf00, addr = 0xf0807334, len = 4) status=0, returning SUCCESS pdc_symbol_addrs (user_current_pid = 8323536, symbols = 0xfffffffffffe258, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE I am parent I am parent So, we can see clearly it did try to check if our new inferiors can be threaded and it failed. What I observed one thing while I tried to guess how Linux might be doing it is that once it detects a new inferior it continuously calls ps_pglobal_lookup in proc-service.c using an observable till it succeeds in reading the symbol. Kindly see the below output of linux for the same program. {Output Credits:- Linux GDB} [New Thread 0x7ffff7cff170 (LWP 259785)] [New Thread 0x7ffff74ef170 (LWP 259786)] [New inferior 2 (process 259787)] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New inferior 3 (process 259788)] The backtrace might be as follows:- #11 0x0000000010063fe8 in std::_Function_handler::_M_invoke(std::_Any_data const&, inferior*&&) at /usr/include/c++/8/bits/std_function.h:297 #12 0x00000000102c77b8 in std::function::operator()(inferior*) const ( __args#0=0x112cd920, this=) at /usr/include/c++/8/bits/std_function.h:687 #13 gdb::observers::observable::notify (args#0=0x112cd920, this=) at ./../gdbsupport/observable.h:166 #14 post_create_inferior at infcmd.c:315 #15 0x00000000102e25a4 in follow_fork_inferior (detach_fork=false, follow_child=false) at infrun.c:683 #16 follow_fork () at infrun.c:79 #17 0x00000000102ec7b8 in handle_inferior_event at infrun.c:5728 #18 0x00000000102ed9d8 in fetch_inferior_event () at infrun.c:4233 #19 0x00000000102c05a4 in inferior_event_handler at inf-loop.c:41 #20 0x0000000010317e7c in handle_target_event at -nat.c:4216 #21 0x0000000010957e68 in handle_file_event at event-loop.cc:549 #22 0x00000000109588c4 in gdb_wait_for_event at event-loop.cc:670 #23 0x0000000010958cac in gdb_wait_for_event (block=0) at event-loop.cc:569 #24 gdb_do_one_event () at event-loop.cc:210 #25 0x000000001034d684 in start_event_loop () at main.c:411 #26 captured_command_loop () at main.c:471 #27 0x000000001034f8b0 in captured_main at main.c:1329 #28 gdb_main (args=) at main.c:1344 #29 0x000000001001a188 in main at gdb.c:32 Post stack number #11 they might be going to ps_pglobal_lookup () everytime till they can make the new inferior thread debugging possible. AIX on the other hand calls the pdc_symbol_adddress () only once for a new inferior after first inferior. For the first inferior as well, things succeed only in the fourth time as shown below for a 32-bit code. pdc_symbol_addrs (user_current_pid = 11272598, symbols = 0xfffffffffffdbd8, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE pdc_symbol_addrs (user_current_pid = 11272598, symbols = 0xfffffffffffdbd8, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE pdc_symbol_addrs (user_current_pid = 11272598, symbols = 0xfffffffffffdbd8, count = 1) symbols[0].name = "__n_pthreads" returning PDC_FAILURE pdc_symbol_addrs (user_current_pid = 11272598, symbols = 0xfffffffffffdbd8, count = 1) symbols[0].name = "__n_pthreads" symbols[0].addr = 0xf0807334 returning PDC_SUCCESS So, I guess if we manage to do something similar just like for the first inferior, we will get to the solution, but I did not understand how Linux might be reading the symbol again and again for a new inferior or AIX for that matter for the first inferior. Kindly let me know how we can do something similar or are we missing something here that I have not kept in mind in our attempt to solve this for AIX and GDB community. Let me know what you think. Waiting for a reply soon. Have a nice day. Regards, Aditya. ________________________________ From: Ulrich Weigand Sent: 15 December 2022 21:23 To: simark@simark.ca ; Aditya Kamath1 ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Aditya Kamath1 wrote: >[New Thread 258] >[New Thread 515] >fetch_regs_kernel_thread tid=225018d regno=64 arch64=0 >[New inferior 2 (process 8061286)] >pdc_free (user_current_pid = 17957132, buf = 0x11016f370) >pdc_free (user_current_pid = 17957132, buf = 0x11016f3b0) >pdc_free (user_current_pid = 17957132, buf = 0x11016f4f0) >pdc_free (user_current_pid = 17957132, buf = 0x1104e3a70) >pdc_free (user_current_pid = 17957132, buf = 0x1108af0d0) >pdc_symbol_addrs (user_current_pid = 17957132, symbols = 0xfffffffffffdef8, count = 1) > symbols[0].name = "__n_pthreads" > returning PDC_FAILURE >pdc_symbol_addrs (user_current_pid = 8061286, symbols = 0xfffffffffffe248, count = 1) > symbols[0].name = "__n_pthreads" > returning PDC_FAILURE >I am parent >[New process 17957132] >[New inferior 3 (process 17433000)] >pdc_symbol_addrs (user_current_pid = 17957132, symbols = 0xfffffffffffdef8, count = 1) > symbols[0].name = "__n_pthreads" > returning PDC_FAILURE I think the problem here may be that the lookup_minimal_symbol call in pdc_symbol_addrs has to be performed in the correct process address space. This wasn't an issue before since the routine was only called for the first process anyway. Look at the equivalent routine on Linux, which is ps_pglobal_lookup in proc-service.c. This does: inferior *inf = ph->thread->inf; scoped_restore_current_program_space restore_pspace; set_current_program_space (inf->pspace); You'll need to do the equivalent (set the program space to the one appropriate for the inferior referenced by user_current_pid). Bye, Ulrich