Hi all, Please find attached the new patch. [Kindly see: 0001-Fix-multi-thread-bug-fix-in-AIX.patch]. This patch works for a single process with multiple threads and multiple process with detach on fork on. Kindly find the program1 and 2 below and the outputs named as output 1 and output 2 respectively. The issue:- When we ran the unit test cases, we realised we have a problem in the multi process detach on fork on case where this patch fails. The issue is when there is interrupt pressed while a thread is running [threads after using fork() and stuck in while (1);], it somehow manages to switch the process as shown highlighted in bold in output 3 below this email. Due to this what should have been shown as threads is now being shown as process. A detailed explaination on the solution proposed for single process:- In AIX we were adding a main thread named say 'Thread 1' extra when we already had a main process thread named say 'process 100'. This created a mess while sinking and with the recent change in the get_aix_thread_info () function where it won't allow any thread with no private data there by leading to a crash. So, the proposed patch[attached in this email] modifies our sync_threadlists () function where a check if a process has tid or not. If it has then we know my program is multi-threaded and I need to change main process thread ptid from 'process 100' to 'Thread 1' as Ulrich mentioned in the previous threads like ptid (pid, 0, 0) to ptid (pid, 0, tid or pthreadID). This is done by thread_change_ptid () in the patch. While this is done, there are cases where when we read target memory we need to be in the right context.. Hence, the proposed patch changes inferior_ptid in pdc_read_data() as mentioned by Ulrich. + inferior_ptid = ptid_t (user_current_pid); Here's the thing we were not counting our main thread in GDB thread lists [gcount].. Now we do so. The proposed patch has removed the condition. - if (PD_TID (thread->ptid)) Before we had an extra thread 'process 100' apart from 'thread 1'. So, in case someone interrupted a thread with ctrl+c.. In the pd_update () even if we don't have thread who signalled this interrupt when we return ptid_t (pid) it was fine. But now with no 'process 100' and only 'thread 1', we need to take care of interrupt as well, otherwise GDB core will take ptid_t(100) as a new process. So the change + if (thrinf.ti_cursig == SIGTRAP || thrinf.ti_cursig == SIGINT) These changes solve the single process case. A detailed explaination on the solution proposed for multi process with multi thread case:- Consider my main program is multi-threaded and each thread can fork().. After we record all events there are two things, we need to take care of in my rs6000-aix-nat.c.. 1:- There can be a child event whose parent can be pthreaded or multi-threaded. The negation of this is fine since we anyway can return only ptid_t (pid). In this case if we return a ptid_t (pid) then gdb core treats it as a new process since our parent thread is pthreaded GDB core only knows threads like ptid_t (pid, tid, pthid) and not ptid_t (pid). In order to avoid this, the proposed patch uses a function call find_the_return_ptid () to figure out the same. The changes like below are for this reason. ourstatus->set_forked (ptid_t (pid)); -            return ptid_t (parent_pid); +            return find_the_return_ptid (parent_pid); Kindly note that the control will always not come from aix-thread.c for such events. Hence, we cannot take care of the same there, though it will be a relief if we can do that. Having said that the propsed patch uses lwp parameter inorder to store the kernel thread ID so that we can fetch it here and use it here. 2:- There are cases where we need to pass my ptid_t (pid, lwp, tid) to aix-thread.c instead of ptid_t (pid).. If we try to fix 1 this way, then then below assertion causes us trouble if we go through aix-thread.c for our beneath->wait. Hence the proposed patch has it removed. - /* The target beneath does not deal with threads, so it should only return - pid-only ptids. */ - gdb_assert (ptid.is_pid ()); We are not able to know why our output 3 is switching to a process[ Output 3 in the mail below].. So, we are at one of the threads of process 1. On an interrupt we need to switch our threads to the thread that caused it. But the GDB core is saying we switched our process. It shouldn't as we are in the same process. And I also checked if we returned the correct ptid that raised the interrupt and we do. I tried to debug on what is going on in the GDB core but failed to understand for the last couple of days, and therefore I did not succeed. If I do not understand, then I will not be able to solve this problem for AIX and GDB. Hence, I wish to reach out to you all after this effort for a solution. Since you all are experts having vast experience and knowledge in GDB, kindly let me know what is going on in the output 3 case [pasted below in this email] or the things mentioned in the patch are done using a wrong method. If so, then let me know what we can do correctly and efficiently from here. Hope to seek a reply. Have a nice day ahead. Thanks and regards, Aditya ---------------------------------------------------------------- Program1:- [Credits gdb.threads/continuous-pending.c] #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 3 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); while (1); /* break here */ } int main (void) { int i; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { pthread_t thread; int res; res = pthread_create (&thread, NULL, thread_function, NULL); assert (res == 0); } while (1) sleep (1); return 0; } ---------------------------------------------------------------- Output1:- Single process Reading symbols from /home/aditya/gdb_tests/continue-pending-status... (gdb) r Starting program: /home/aditya/gdb_tests/continue-pending-status ^C[New Thread 258] [New Thread 515] [New Thread 772] Thread 3 received signal SIGINT, Interrupt. [Switching to Thread 515] thread_function (arg=0x0) at /home/aditya/gdb_tests/continue-pending-status.c:36 36 while (1); /* break here */ (gdb) info threads Id Target Id Frame 1 Thread 1 (tid 24838585, running) warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 2 Thread 258 (tid 23134635, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/continue-pending-status.c:36 * 3 Thread 515 (tid 30146867, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/continue-pending-status.c:36 4 Thread 772 (tid 27853165, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/continue-pending-status.c:36 --------------------------------------------------------------------------------- Program 2:- Multi process Code #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else{ printf (" Iam child \n"); child = fork (); if (child > 0) printf ("From child I became a parent \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { pthread_t thread; int res; res = pthread_create (&thread, NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); break; } return 0; } ------------------------------------------------------------------------- Output 2:- With detach-on-fork on Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New Thread 515] [Detaching after fork from child process 8323572] Iam child I am grandchild From child I became a parent I am parent [Detaching after fork from child process 11665884] Iam child I am grandchild From child I became a parent I am parent ^C Thread 2 received signal SIGINT, Interrupt. [Switching to Thread 258] thread_function (arg=0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 32 while (1); /* break here */ (gdb) info threads Id Target Id Frame 1 Thread 1 (tid 27263269, running) warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) * 2 Thread 258 (tid 28705075, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 3 Thread 515 (tid 27853169, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.) 0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 ------------------------------------------------------------------------- Output 3:- With detach-on-fork off Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (process 8323570)] I am parent [New inferior 3 (process 17957172)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. [Switching to process 16777620] 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info threads Id Target Id Frame * 1.1 process 16777620 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 process 16777620 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.3 process 16777620 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 2.1 process 8323570 0xd0594fc8 in ?? () 3.1 process 17957172 0xd0594fc8 in ?? () ________________________________ From: Gdb-patches on behalf of Aditya Kamath1 via Gdb-patches Sent: 24 November 2022 00:15 To: Ulrich Weigand ; simark@simark.ca ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: [EXTERNAL] Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Hi Ulrich, >static int >giter_count (struct thread_info *thread, void *countp) >{ > if (PD_TID (thread->ptid)) > (*(int *) countp)++; > return 0; >} >Maybe that comment is wrong about pthreaddebug not including >the main thread? Or maybe that changed between AIX versions? >In any case, something needs to be fixed here. Even if we fix it here [assuming we are succesful], in the delete_thread_1 () in thread.c we will fail to hit thread->deletable as true while we attempt delete_thread (gbuf [gi]).. Because refcount will not be 0 when we attempt to delete main thread with ptid (pid, 0, 0). {see func below} bool thread_info::deletable () const { /* If this is the current thread, or there's code out there that relies on it existing (refcount > 0) we can't delete yet. */ return refcount () == 0 && !is_current_thread (this); } I will be trying to replace the main thread instead like thread_change_ptid (proc_target, gptid, pptid) subject to a condition check that gptid.tid () == 0.. Otherwise, if it is not a main thread [gptid.tid () != 0], we can delete gbuf[gi].. We can apply this else where as well in sync_threadlists (). Let me know if we have an alternate optimal option that can delete this thread or why the solution in the above paragraph can fail.. Rest of the things I will handle. No problem. Thanks and regards, Aditya.. ________________________________ From: Ulrich Weigand Sent: 23 November 2022 22:39 To: simark@simark.ca ; Aditya Kamath1 ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Aditya Kamath1 wrote: >Hmm. So, something is going wrong here.. >gcount = 0; > iterate_over_threads (giter_count, &gcount); > g = gbuf = XNEWVEC (struct thread_info *, gcount); > iterate_over_threads (giter_accum, &g); > qsort (gbuf, gcount, sizeof *gbuf, gcmp); Looks like this is deliberate: /* iterate_over_threads() callback for counting GDB threads. Do not count the main thread (whose tid is zero). This matches the list of threads provided by the pthreaddebug library, which does not include that main thread either, and thus allows us to compare the two lists. */ static int giter_count (struct thread_info *thread, void *countp) { if (PD_TID (thread->ptid)) (*(int *) countp)++; return 0; } Maybe that comment is wrong about pthreaddebug not including the main thread? Or maybe that changed between AIX versions? In any case, something needs to be fixed here. Bye, Ulrich