Hi Tom, Ulrich and community, I made a mistake by not removing a commented line in the last patch. I apologise for the same. Kindly ignore the same. Thank you for the feedback for the fix of this bug. Please find attached the patch. {See: 0001-Fix-multi-thread-debug-bug-in-AIX.patch}. So, I have fixed the bug and it works alright. Please find the test program, output with patch and without patch pasted below this email. >> We now have all variables {pd_able, pd_active and pd_session} now in a >> map of process ID and structure. This will help us make AIX GDB code >> easy to manage them per process in the aix-thread.c file. >I don't really know what this is about, but it's probably better to >attach the data directly to the inferior using the registry system. >(You can't use private_inferior as apparently that's reserved for the >process stratum.) >Search for registry for some examples. >It's better to pass in a ptid or even the aix_thread_variables object >itself than to rely on globals in low-level functions like this. So, I have taken care of this. Now we use the registry. Thank you for this suggestion. I was not knowing this. This is a very nice feature. >> Secondly, in the function pid_to_str () there is a beneath () call, >> which is why I had to put this function in rs6000-aix-nat.c file. >I wonder why it's necessary, as it seems to me that >aix_thread_target::pid_to_str should have already handled the 'thread' >case, so the inherited method ought to be good enough. This I have removed. I made a mistake while analysing this solution. Thank you for pointing it out. It works without it. Kindly check the output below. >Is there an existing gdb test case that exercises this code? >If not then it seems like a new test is warranted. This I am not aware of at least when I tried finding. What we need is a test case to check if the shared library is loaded for every new inferior born and the top target is set correctly in case of thread debugging. If something exists, I would like to know. >> + return object_bfd; >> + it++; >This doesn't look right to me at all. Using a global means that BFDs >from one inferior might "leak" to another, based solely on whether a >certain name was ever seen. Also nothing ever cleans out the global >vector. >It's better to attach this data to the relevant BFD using the registry >system, and not use a global at all. So we already attach this data using the lines here in the same function. std::string fname = string_printf ("%s%s", bfd_get_filename (archive_bfd.get ()), sep); bfd_set_filename (object_bfd.get (), fname.c_str ()); All we need to the right match for the name of the shared library. So, we already have a pathname variable. I used it and removed the vector. Kindly see it in the patch. You were right. There is nothing that could have clean that vector. Kindly give me feedback if we can do anything better or is incorrect. If not, kindly push this patch so that AIX folks can have a better debugging experience. Have a nice day ahead. Thanks and regards, Aditya. ------------------------------------------- Code:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else { child = fork (); if (child > 0) printf ("I am child \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } ------------------------ Output with patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (process 15728962)] I am parent [New inferior 3 (process 20382144)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. [Switching to Thread 1] 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info threads Id Target Id Frame * 1.1 Thread 1 (tid 34144675, running) 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 Thread 258 (tid 30146951, running) thread_function (arg=0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 1.3 Thread 515 (tid 37159321, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 2.1 process 15728962 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) 3.1 process 20382144 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info sharedlibrary From To Syms Read Shared Object Library 0xd05bc124 0xd05bf194 Yes (*) /usr/lib/libpthreads.a(shr_comm.o) 0xd05bb240 0xd05bb9a1 Yes (*) /usr/lib/libcrypt.a(shr.o) 0xd0576180 0xd05ba731 Yes (*) /usr/lib/libpthread.a(shr_xpg5.o) 0xd0100e00 0xd0575123 Yes (*) /usr/lib/libc.a(shr.o) (*): Shared library is missing debugging information. (gdb) inferior 2 [Switching to inferior 2 [process 15728962] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 15728962)] #0 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) ---------------------------- Output without patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 1] [New Thread 258] [New Thread 515] [New inferior 2 (process 11731200)] I am parent [New inferior 3 (process 16843200)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [process 11731200] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 11731200)] #0 0xd0594fc8 in ?? () (gdb) info threads Id Target Id Frame 1.1 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.3 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.4 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) * 2.1 process 11731200 0xd0594fc8 in ?? () 3.1 process 16843200 0xd0594fc8 in ?? () (gdb) info sharedlibrary warning: "/usr/lib/libpthreads.a": member "shr_comm.o" missing. warning: "/usr/lib/libcrypt.a": member "shr.o" missing. warning: "/usr/lib/libpthread.a": member "shr_xpg5.o" missing. warning: "/usr/lib/libc.a": member "shr.o" missing. warning: Could not load shared library symbols for 4 libraries, e.g. /usr/lib/libpthreads.a(shr_comm.o). Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? From To Syms Read Shared Object Library No /usr/lib/libpthreads.a(shr_comm.o) No /usr/lib/libcrypt.a(shr.o) No /usr/lib/libpthread.a(shr_xpg5.o) No /usr/lib/libc.a(shr.o) (gdb) ________________________________ From: Gdb-patches on behalf of Aditya Kamath1 via Gdb-patches Sent: 02 February 2023 11:54 To: Ulrich Weigand ; simark@simark.ca ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: [EXTERNAL] Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Hi Tom, Ulrich and community, Thank you for the feedback for the fix of this bug. Please find attached the patch. {See: 0001-Fix-multi-thread-debug-bug-in-AIX.patch}. So, I have fixed the bug and it works alright. Please find the test program, output with patch and without patch pasted below this email. >> We now have all variables {pd_able, pd_active and pd_session} now in a >> map of process ID and structure. This will help us make AIX GDB code >> easy to manage them per process in the aix-thread.c file. >I don't really know what this is about, but it's probably better to >attach the data directly to the inferior using the registry system. >(You can't use private_inferior as apparently that's reserved for the >process stratum.) >Search for registry for some examples. >It's better to pass in a ptid or even the aix_thread_variables object >itself than to rely on globals in low-level functions like this. So, I have taken care of this. Now we use the registry. Thank you for this suggestion. I was not knowing this. This is a very nice feature. >> Secondly, in the function pid_to_str () there is a beneath () call, >> which is why I had to put this function in rs6000-aix-nat.c file. >I wonder why it's necessary, as it seems to me that >aix_thread_target::pid_to_str should have already handled the 'thread' >case, so the inherited method ought to be good enough. This I have removed. I made a mistake while analysing this solution. Thank you for pointing it out. It works without it. Kindly check the output below. >Is there an existing gdb test case that exercises this code? >If not then it seems like a new test is warranted. This I am not aware of at least when I tried finding. What we need is a test case to check if the shared library is loaded for every new inferior born and the top target is set correctly in case of thread debugging. If something exists, I would like to know. >> + return object_bfd; >> + it++; >This doesn't look right to me at all. Using a global means that BFDs >from one inferior might "leak" to another, based solely on whether a >certain name was ever seen. Also nothing ever cleans out the global >vector. >It's better to attach this data to the relevant BFD using the registry >system, and not use a global at all. So we already attach this data using the lines here in the same function. std::string fname = string_printf ("%s%s", bfd_get_filename (archive_bfd.get ()), sep); bfd_set_filename (object_bfd.get (), fname.c_str ()); All we need to the right match for the name of the shared library. So, we already have a pathname variable. I used it and removed the vector. Kindly see it in the patch. You were right. There is nothing that could have clean that vector. Kindly give me feedback if we can do anything better or is incorrect. If not, kindly push this patch so that AIX folks can have a better debugging experience. Have a nice day ahead. Thanks and regards, Aditya. ------------------------------------------- Code:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else { child = fork (); if (child > 0) printf ("I am child \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } ------------------------ Output with patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (process 15728962)] I am parent [New inferior 3 (process 20382144)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. [Switching to Thread 1] 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info threads Id Target Id Frame * 1.1 Thread 1 (tid 34144675, running) 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 Thread 258 (tid 30146951, running) thread_function (arg=0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 1.3 Thread 515 (tid 37159321, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 2.1 process 15728962 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) 3.1 process 20382144 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info sharedlibrary From To Syms Read Shared Object Library 0xd05bc124 0xd05bf194 Yes (*) /usr/lib/libpthreads.a(shr_comm.o) 0xd05bb240 0xd05bb9a1 Yes (*) /usr/lib/libcrypt.a(shr.o) 0xd0576180 0xd05ba731 Yes (*) /usr/lib/libpthread.a(shr_xpg5.o) 0xd0100e00 0xd0575123 Yes (*) /usr/lib/libc.a(shr.o) (*): Shared library is missing debugging information. (gdb) inferior 2 [Switching to inferior 2 [process 15728962] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 15728962)] #0 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) ---------------------------- Output without patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 1] [New Thread 258] [New Thread 515] [New inferior 2 (process 11731200)] I am parent [New inferior 3 (process 16843200)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [process 11731200] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 11731200)] #0 0xd0594fc8 in ?? () (gdb) info threads Id Target Id Frame 1.1 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.3 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.4 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) * 2.1 process 11731200 0xd0594fc8 in ?? () 3.1 process 16843200 0xd0594fc8 in ?? () (gdb) info sharedlibrary warning: "/usr/lib/libpthreads.a": member "shr_comm.o" missing. warning: "/usr/lib/libcrypt.a": member "shr.o" missing. warning: "/usr/lib/libpthread.a": member "shr_xpg5.o" missing. warning: "/usr/lib/libc.a": member "shr.o" missing. warning: Could not load shared library symbols for 4 libraries, e.g. /usr/lib/libpthreads.a(shr_comm.o). Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? From To Syms Read Shared Object Library No /usr/lib/libpthreads.a(shr_comm.o) No /usr/lib/libcrypt.a(shr.o) No /usr/lib/libpthread.a(shr_xpg5.o) No /usr/lib/libc.a(shr.o) (gdb) ________________________________ From: Gdb-patches on behalf of Aditya Kamath1 via Gdb-patches Sent: 27 January 2023 20:10 To: Ulrich Weigand ; simark@simark.ca ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: [EXTERNAL] Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Hi Ulrich and community, Thank you for the feedback for the fix of this bug. Please find attached the patch. {See: 0001-Fix-multi-thread-debug-bug-in-AIX.patch} So, I have fixed the bug and it works alright. Please find the test program, output with patch and without patch pasted below this email. >+ if (s.find (member_name) != std::string::npos) >+ { >+ return object_bfd; >+ } >This matches the member name *anywhere* in the full >filename, >which could lead to spurious matches, I think. The test >should be more specific. This I have taken care in the patch. There are a few changes for which I want to explain below. We now have all variables {pd_able, pd_active and pd_session} now in a map of process ID and structure. This will help us make AIX GDB code easy to manage them per process in the aix-thread.c file. Secondly, in the function pid_to_str () there is a beneath () call, which is why I had to put this function in rs6000-aix-nat.c file. Third thing is previously if there was no object file, we would use pd_disable () to disable thread debugging. This is incorrect now that we support multiple inferiors. Since we rely on inferior_ptid with new object file function till a point, we must disable only when we mourn the inferior or a process dies. Otherwise, there is every chance we will disable thread debugging for a wrong inferior that can be currently inferior_ptid. It also creates a mess disabling the pd_active for the wrong inferior in cases where a new inferior is born who object file is being loaded. This change can be seen in the patch. I have written comments for the remaining changes in the patch. Kindly give me feedback if we can do anything better or is incorrect. If not, kindly push this patch so that AIX folks can have a better debugging experience. Have a nice day ahead. Thanks and regards, Aditya. ------------------------------------------- Code:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else { child = fork (); if (child > 0) printf ("I am child \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } --------------------------------------------------- Output with patch applied:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (Process 17498448)] I am parent [New inferior 3 (Process 11731454)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. [Switching to Thread 1] 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [Process 17498448] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (Process 17498448)] #0 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info threads Id Target Id Frame 1.1 Thread 1 (tid 25231849, running) 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 Thread 258 (tid 33227061, running) thread_function (arg=0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 1.3 Thread 515 (tid 23069149, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 * 2.1 Process 17498448 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) 3.1 Process 11731454 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info sharedlibrary From To Syms Read Shared Object Library 0xd05bc124 0xd05bf194 Yes (*) /usr/lib/libpthreads.a(shr_comm.o) 0xd05bb240 0xd05bb9a1 Yes (*) /usr/lib/libcrypt.a(shr.o) 0xd0576180 0xd05ba731 Yes (*) /usr/lib/libpthread.a(shr_xpg5.o) 0xd0100e00 0xd0575123 Yes (*) /usr/lib/libc.a(shr.o) (*): Shared library is missing debugging information. -------------------------------------------------------- Output without patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 1] [New Thread 258] [New Thread 515] [New inferior 2 (process 11731200)] I am parent [New inferior 3 (process 16843200)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [process 11731200] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 11731200)] #0 0xd0594fc8 in ?? () (gdb) info threads Id Target Id Frame 1.1 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.3 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.4 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) * 2.1 process 11731200 0xd0594fc8 in ?? () 3.1 process 16843200 0xd0594fc8 in ?? () (gdb) info sharedlibrary warning: "/usr/lib/libpthreads.a": member "shr_comm.o" missing. warning: "/usr/lib/libcrypt.a": member "shr.o" missing. warning: "/usr/lib/libpthread.a": member "shr_xpg5.o" missing. warning: "/usr/lib/libc.a": member "shr.o" missing. warning: Could not load shared library symbols for 4 libraries, e.g. /usr/lib/libpthreads.a(shr_comm.o). Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? From To Syms Read Shared Object Library No /usr/lib/libpthreads.a(shr_comm.o) No /usr/lib/libcrypt.a(shr.o) No /usr/lib/libpthread.a(shr_xpg5.o) No /usr/lib/libc.a(shr.o) (gdb) ________________________________ From: Ulrich Weigand Sent: 20 January 2023 20:14 To: simark@simark.ca ; Aditya Kamath1 ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Aditya Kamath1 wrote: >Inorder to resolve the same I request for one information. How can we iterate_over_threads >of a particular process. What is that function. Is there any built-in available?? >Kindly let me know and that should solve this issue. Instead of iterate_over_threads you could use the all_threads() iterator directly; this can be specialized to only return threads of one inferior like this: for (thread_info *tp : all_threads (proc_target, ptid_t (pid))) { ... } >Also kindly give me feedback on this patch if I need to change anything. I think this change in solib-aix.c is not quite correct: + std::string s = bfd_get_filename (object_bfd.get ()); + if (s.find (member_name) != std::string::npos) + { + return object_bfd; + } This matches the member name *anywhere* in the full filename, which could lead to spurious matches, I think. The test should be more specific. Bye, Ulrich