Hi Ulrich, Tom and community, Please find attached the patch. {See: 0001-Fix-multi-thread-debug-bug-in-AIX.patch} The previous email could not get to the community mailing list as it had a size constraint. So, the bug is fixed, and outputs are the same as pasted below in this email. In the patch commit message I have provided specific and detailed explaination of every change as much as possible. So honestly there was a bug left to fix in the previous email of the patch while we followed the child in particular. I figured it out as I was testing more deeply. Kindly see section "Solution Part 1: - " in the patch commit message where I have explained the same. Kindly suggest me for changes if needed. Otherwise kindly let me know if this is ready for commit. >Now if the application is >theraded and we only pass ptid_t (user_current_pid) to switch_to_thread () >it will crash as main thread looks different or is ptid_t (pid, 0, tid). > This part I don't quite understand yet - how/why does it crash? Kindly check the section "Solution Part 2: - " of the patch, where I have explained this. >Similarly, I agree that everything may currently "work" without >adding the equivalent change to pdc_write_memory, but most likely >this is simply because that callback may just not be used very much. Yes, I agree. We have the changed user_current_pid variable to thread so that we always switch in the right context. Kindly let me know if it is alright and any changes are necessary here. >Can you come up with a >message that maybe starts out with the high-level change >and goes from there into the specific >details.... Thanks! Sure, so this patch has it. Kindly suggest me if we can do this better. Have a nice day ahead. Thanks and regards, Aditya. -------------------------------------------------------------------------------- Code:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else { child = fork (); if (child > 0) printf ("I am child \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } ------------------------------------------------- Output with patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (process 15335754)] [New inferior 3 (process 8061404)] I am parent I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. [Switching to Thread 1] 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info sharedlibrary From To Syms Read Shared Object Library 0xd05bc124 0xd05bf194 Yes (*) /usr/lib/libpthreads.a(shr_comm.o) 0xd05bb240 0xd05bb9a1 Yes (*) /usr/lib/libcrypt.a(shr.o) 0xd0576180 0xd05ba731 Yes (*) /usr/lib/libpthread.a(shr_xpg5.o) 0xd0100e00 0xd0575123 Yes (*) /usr/lib/libc.a(shr.o) (*): Shared library is missing debugging information. (gdb) info threads Id Target Id Frame * 1.1 Thread 1 (tid 30671243, running) 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 Thread 258 (tid 34406781, running) thread_function (arg=0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 1.3 Thread 515 (tid 36307315, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 2.1 process 15335754 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) 3.1 process 8061404 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [process 15335754] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 15335754)] #0 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) ----------------------------------------------------------------------- Output without patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 1] [New Thread 258] [New Thread 515] [New inferior 2 (process 11731200)] I am parent [New inferior 3 (process 16843200)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [process 11731200] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 11731200)] #0 0xd0594fc8 in ?? () (gdb) info threads Id Target Id Frame 1.1 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.3 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.4 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) * 2.1 process 11731200 0xd0594fc8 in ?? () 3.1 process 16843200 0xd0594fc8 in ?? () (gdb) info sharedlibrary warning: "/usr/lib/libpthreads.a": member "shr_comm.o" missing. warning: "/usr/lib/libcrypt.a": member "shr.o" missing. warning: "/usr/lib/libpthread.a": member "shr_xpg5.o" missing. warning: "/usr/lib/libc.a": member "shr.o" missing. warning: Could not load shared library symbols for 4 libraries, e.g. /usr/lib/libpthreads.a(shr_comm.o). Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? From To Syms Read Shared Object Library No /usr/lib/libpthreads.a(shr_comm.o) No /usr/lib/libcrypt.a(shr.o) No /usr/lib/libpthread.a(shr_xpg5.o) No /usr/lib/libc.a(shr.o) (gdb) ________________________________ From: Aditya Kamath1 Sent: 10 February 2023 22:03 To: Ulrich Weigand ; simark@simark.ca ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Hi Ulrich, Tom and community, Please find attached the patch. {See: 0001-Fix-multi-thread-debug-bug-in-AIX.patch} Also find attached a document that I have proposed as a commit message. {See: Fix Multi Thread debug fix for AIX.pdf}.. This same document is used in the commit message of this patch. So the bug is fixed and test cases run alright. Kindly check the sample output of the same pasted below this email. In the document I have provided specific and detailed explaination of every change as much as possible. So honestly there was a bug left to fix in the previous email of the patch while we followed the child in particular. I figured it out as I was testing more deeply. Kindly see section "Solution Part 1: - " where I have explained the same. Kindly suggest me for changes if needed. Otherwise kindly let me know if this is ready for commit. >Previously we used switch_to_thread ().. Now if the application is >theraded and we only pass ptid_t (user_current_pid) to switch_to_thread () >it will crash as main thread looks different or is ptid_t (pid, 0, tid). > This part I don't quite understand yet - how/why does it crash? Kindly check "Solution Part 2: - " of the document, where I have explained this. >Similarly, I agree that everything may currently "work" without >adding the equivalent change to pdc_write_memory, but most likely >this is simply because that callback may just not be used very much. Yes, I agree. We have the changed user_current_pid variable to thread so that we always switch in the right context. Kindly let me know if it is alright and any changes are necessary here. >Can you come up with a >message that maybe starts out with the high-level change >(along the lines of "update aix-thread.c to handle threads in >multiple inferiors"), and goes from there into the specific >details (aix_thread_variables structure, handling only a >single inferior per sync_threadlists invocation, solib fixes >for multiple inferiors, ...)? Thanks! Sure, so the pdf attached in this email has it. Kindly suggest me if we can do this better. Have a nice day ahead. Thanks and regards, Aditya. -------------------------------------------------------------------------------- Code:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else { child = fork (); if (child > 0) printf ("I am child \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } ------------------------------------------------- Output with patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (process 15335754)] [New inferior 3 (process 8061404)] I am parent I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. [Switching to Thread 1] 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info sharedlibrary From To Syms Read Shared Object Library 0xd05bc124 0xd05bf194 Yes (*) /usr/lib/libpthreads.a(shr_comm.o) 0xd05bb240 0xd05bb9a1 Yes (*) /usr/lib/libcrypt.a(shr.o) 0xd0576180 0xd05ba731 Yes (*) /usr/lib/libpthread.a(shr_xpg5.o) 0xd0100e00 0xd0575123 Yes (*) /usr/lib/libc.a(shr.o) (*): Shared library is missing debugging information. (gdb) info threads Id Target Id Frame * 1.1 Thread 1 (tid 30671243, running) 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 Thread 258 (tid 34406781, running) thread_function (arg=0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 1.3 Thread 515 (tid 36307315, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 2.1 process 15335754 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) 3.1 process 8061404 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [process 15335754] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 15335754)] #0 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) ----------------------------------------------------------------------- Output without patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 1] [New Thread 258] [New Thread 515] [New inferior 2 (process 11731200)] I am parent [New inferior 3 (process 16843200)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [process 11731200] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 11731200)] #0 0xd0594fc8 in ?? () (gdb) info threads Id Target Id Frame 1.1 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.3 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.4 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) * 2.1 process 11731200 0xd0594fc8 in ?? () 3.1 process 16843200 0xd0594fc8 in ?? () (gdb) info sharedlibrary warning: "/usr/lib/libpthreads.a": member "shr_comm.o" missing. warning: "/usr/lib/libcrypt.a": member "shr.o" missing. warning: "/usr/lib/libpthread.a": member "shr_xpg5.o" missing. warning: "/usr/lib/libc.a": member "shr.o" missing. warning: Could not load shared library symbols for 4 libraries, e.g. /usr/lib/libpthreads.a(shr_comm.o). Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? From To Syms Read Shared Object Library No /usr/lib/libpthreads.a(shr_comm.o) No /usr/lib/libcrypt.a(shr.o) No /usr/lib/libpthread.a(shr_xpg5.o) No /usr/lib/libc.a(shr.o) (gdb) ________________________________ From: Ulrich Weigand Sent: 09 February 2023 00:14 To: simark@simark.ca ; Aditya Kamath1 ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Aditya Kamath1 wrote: >>This seems unrelated to the rest of the changes at first glance. >>Why is this necessary? > >So, when we need to be in the right context when we read memory. Before >coming into the target wait, we switch_to_no_thread () due to which our >inferior_ptid is set to null. Our target_memory needs the correct >inferior_ptid. Also, in case we don't have a ptid_t (pid) and the >application is threaded we need the inferior_ptid to be set correctly >like shown in the patch. Understood. >Previously we used switch_to_thread ().. Now if the application is >theraded and we only pass ptid_t (user_current_pid) to switch_to_thread () >it will crash as main thread looks different or is ptid_t (pid, 0, tid). This part I don't quite understand yet - how/why does it crash? >>By comparison, the Linux version of this in proc-service.c also >>switches the current inferior and address space: > > scoped_restore_current_inferior restore_inferior; > > set_current_inferior (ph->thread->inf); > > scoped_restore_current_program_space restore_current_progspace; > > set_current_program_space (ph->thread->inf->pspace); > > scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid); > > inferior_ptid = ph->thread->ptid; >> so we should probably do the same for consistency. >So, kindly allow me to disagree with you on this. What is happening is in >inferior.c in do_target_wait1 () we call switch_to_inferior_no_thread ().. [snip] >Here we already set the correct current inferior and program space to >the same thing as that if we set in pdc_read_memory like linux. >So, it does not make any difference to add the changes like linux does Well, it does look like if you entered the callback in this particular context, the inferior may have already been set up correctly. However, in theory the callback could also be called in different contexts, and just as a precaution it would be preferable to have it always work correctly. The semantics of the callback is to read memory of a particular process as identified via the pthdb_user_t argument, and we should write the routine so that it always does what's needed to implement that semantics correctly. >Secondly, things work if we do not do the same for pdc_write_memory. >I have not seen anything not work. So, I don't think it is good to >add it there. What say?? Similarly, I agree that everything may currently "work" without adding the equivalent change to pdc_write_memory, but most likely this is simply because that callback may just not be used very much. But as a precaution, and to accommodate potential future changes e.g. in the libpthdebug.a library, if would be preferable to implement the semantics correctly. (Also, it just looks surprising to see the read and write implementation differ when there is no apparent reason why that should be the case.) >>This looks unnecessarily complicated. Isn't this just > > *g++ = tp; > >This I have changed. The code now looks like: >+ for (thread_info *tp : all_threads (proc_target, ptid_t (pid))) >+ { >+ *g = tp; >+ *g++; >+ } Which is weird, as *g++ dereferences g for no reason. This should simply be: for (thread_info *tp : all_threads (proc_target, ptid_t (pid))) *g++ = tp; >As far as the check gptid.is_pid () is concerned, I will suggest we >keep it there. If cmp_result is > 0 and we have a main process swap >to create a thread. Rest is same in the loop. The reason being handling >pi and gi variables becomes complex otherwise. When this swap happens, >we need to increment both pi and gi.. Because we have taken care of the >main threads in both pthread library and GDB. And this for loop is >executed only once. So, the first event is main process being >pthreaded. Once the swap happens pi and gi become one and since >gcount = pcount = 1 we exit the for loop. Thread addition events comes >after this. Hmm, handling the initial switch of a single PID-only thread to the PID/TID-style ptid_t separately before still seems a bit clearer to me. But in the end your proposed code looks correct now so I'd be fine with it as is, if you prefer. Except for the few things mentioned above, this now looks ready to be committed to me. However, I'm not sure the commit message fully describes the latest version of the patch, after we've gone through all those iterations ... Can you come up with a message that maybe starts out with the high-level change (along the lines of "update aix-thread.c to handle threads in multiple inferiors"), and goes from there into the specific details (aix_thread_variables structure, handling only a single inferior per sync_threadlists invocation, solib fixes for multiple inferiors, ...)? Thanks! Bye, Ulrich