Hi Ulrich, Tom and community, Please find attached the patch. I have written my answers to the previous comments. Kindly let me know if we need more changes. If not kindly push this to the community code. The sample output and programs are pasted below this email. >I think this would fit better into gdb.threads, given that this is about the >interaction of multiple inferiors with the threading library on AIX. This I will do it immediately after this patch is done. >> if (user_current_pid != 0) >>+ inferior_ptid = ptid_t (user_current_pid); >This seems unrelated to the rest of the changes at first glance. >Why is this necessary? So, when we need to be in the right context when we read memory. Before coming into the target wait, we switch_to_no_thread () due to which our inferior_ptid is set to null. Our target_memory needs the correct inferior_ptid. Also, in case we don't have a ptid_t (pid) and the application is threaded we need the inferior_ptid to be set correctly like shown in the patch. Previously we used switch_to_thread ().. Now if the application is theraded and we only pass ptid_t (user_current_pid) to switch_to_thread () it will crash as main thread looks different or is ptid_t (pid, 0, tid). Hence, we set inferior_ptid to simplify. >Also, is the "user_current_pid != 0" check even still needed given >the change to pd_enable() below? So, this I have removed. You were right. >By comparison, the Linux version of this in proc-service.c also >switches the current inferior and address space: > scoped_restore_current_inferior restore_inferior; > set_current_inferior (ph->thread->inf); >scoped_restore_current_program_space restore_current_progspace; >set_current_program_space (ph->thread->inf->pspace); > scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid); >inferior_ptid = ph->thread->ptid; > so we should probably do the same for consistency. So, kindly allow me to disagree with you on this. What is happening is in inferior.c in do_target_wait1 () we call switch_to_inferior_no_thread ().. The function is as follows void switch_to_inferior_no_thread (inferior *inf) { set_current_inferior (inf); switch_to_no_thread (); set_current_program_space (inf->pspace); } Here we already set the correct current inferior and program space to the same thing as that if we set in pdc_read_memory like linux. So, it does not make any difference to add the changes like linux does. In the switch_to_no_thread () we set inferior_ptid to null and that is why we only set inferior_ptid in pdc_read_memory and not anything else. So, I suggest we stick to this plan. Secondly, things work if we do not do the same for pdc_write_memory. I have not seen anything not work. So, I don't think it is good to add it there. What say?? >This looks unnecessarily complicated. Isn't this just > *g++ = tp; This I have changed. >Is this a change in behavior to current GDB? I thought if the >application (whether a single inferior or one of multiple inferiors) >is threaded in the sense that it uses the libpthread library we >wanted to show it as threaded, so that the user can e.g. see the >thread ID in info threads. So, you are right. I read it somewhere which I am not able to recall that only in multiple threads we need to show as thread. I checked the Linux output. It is what you mentioned. I have removed the gcount ==1 && pcount == 1 condition.. >This logic is still confusing me. Why is the > gptid.pid () == pptid.pid () >check still needed? I thought we now collected only threads >of a single process to begin with, so they all ought to have >the same PID? >Also, if the point is the gptid.is_pid () check, this can >really only happen once per inferior, as it is switched >from non-threaded to threaded mode, right? So I removed the gptid.pid () == pptid.pid () condition. The reason I had added was the gcount {Thread count per process} was not per process before. I was worried to swap process. Now we do not need it. As far as the check gptid.is_pid () is concerned, I will suggest we keep it there. If cmp_result is > 0 and we have a main process swap to create a thread. Rest is same in the loop. The reason being handling pi and gi variables becomes complex otherwise. When this swap happens, we need to increment both pi and gi.. Because we have taken care of the main threads in both pthread library and GDB. And this for loop is executed only once. So, the first event is main process being pthreaded. Once the swap happens pi and gi become one and since gcount = pcount = 1 we exit the for loop. Thread addition events comes after this. >That should just be "if (ptid.tid () == 0)" then. This is done >- pd_deactivate (); >+ pd_disable (); >Why is this necessary? If it is, do we even need two >separate pd_deactivate and pd_disable routines any more? So, the process exits then all its threads also exit in the mourn inferior. So, we disable everything. Yes, I removed pd_deactivate (). >>+ if (s.find ('(') != std::string::npos >>+ && s.find (member_name) != std::string::npos) >>+ return object_bfd; >Ah, I guess you also need to ensure the member_name follows >immediately after the '(', otherwise there could be confusion >if the member name happens to be part of the file name as well. This I have changed as per how you mentioned. Kindly check the patch and let me know :) Have a nice day ahead. Thanks and regards, Aditya. ------------------------------------------- Code:- #include #include #include #include #include pthread_barrier_t barrier; #define NUM_THREADS 2 void * thread_function (void *arg) { /* This ensures that the breakpoint is only hit after both threads are created, so the test can always switch to the non-event thread when the breakpoint triggers. */ pthread_barrier_wait (&barrier); pid_t child; child = fork (); if (child > 0) printf ("I am parent \n"); else { child = fork (); if (child > 0) printf ("I am child \n"); else printf ("I am grandchild \n"); } while (1); /* break here */ } int main (void) { int i; pthread_t thread[NUM_THREADS]; alarm (300); pthread_barrier_init (&barrier, NULL, NUM_THREADS); for (i = 0; i < NUM_THREADS; i++) { int res; res = pthread_create (&thread[i], NULL, thread_function, NULL); assert (res == 0); } while (1) { sleep (15); } return 0; } ------------------------------------------------- Output with patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 258] [New Thread 515] [New inferior 2 (process 15991124)] I am parent [New inferior 3 (process 20840796)] I am parent ^Cin Thread 1.1 received signal SIGINT, Interrupt. [Switching to Thread 1] 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) info threads Id Target Id Frame * 1.1 Thread 1 (tid 33947921, running) 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 Thread 258 (tid 37421465, running) thread_function (arg=0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 1.3 Thread 515 (tid 32899441, running) thread_function (arg=0x0) 0x0) at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32 2.1 Thread 515 (tid 33751493, running) 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) 3.1 Thread 258 (tid 34931151, running) 0xd0594fc8 in _sigsetmask () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) ----------------------------------------------------------------------- Output without patch:- Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork... (gdb) set detach-on-fork off (gdb) r Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork [New Thread 1] [New Thread 258] [New Thread 515] [New inferior 2 (process 11731200)] I am parent [New inferior 3 (process 16843200)] I am parent ^C Thread 1.1 received signal SIGINT, Interrupt. 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) (gdb) inferior 2 [Switching to inferior 2 [process 11731200] (/home/aditya/gdb_tests/ultimate-multi-thread-fork)] [Switching to thread 2.1 (process 11731200)] #0 0xd0594fc8 in ?? () (gdb) info threads Id Target Id Frame 1.1 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.2 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.3 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) 1.4 process 15270316 0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o) * 2.1 process 11731200 0xd0594fc8 in ?? () 3.1 process 16843200 0xd0594fc8 in ?? () (gdb) info sharedlibrary warning: "/usr/lib/libpthreads.a": member "shr_comm.o" missing. warning: "/usr/lib/libcrypt.a": member "shr.o" missing. warning: "/usr/lib/libpthread.a": member "shr_xpg5.o" missing. warning: "/usr/lib/libc.a": member "shr.o" missing. warning: Could not load shared library symbols for 4 libraries, e.g. /usr/lib/libpthreads.a(shr_comm.o). Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? From To Syms Read Shared Object Library No /usr/lib/libpthreads.a(shr_comm.o) No /usr/lib/libcrypt.a(shr.o) No /usr/lib/libpthread.a(shr_xpg5.o) No /usr/lib/libc.a(shr.o) (gdb) ________________________________ From: Ulrich Weigand Sent: 07 February 2023 00:37 To: simark@simark.ca ; Aditya Kamath1 ; gdb-patches@sourceware.org Cc: Sangamesh Mallayya Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch Aditya Kamath1 wrote: >>I think the question here is simply whether, if you run the >>test suite both without and with your patch, are any of the >>FAILs fixed with the patch? If not, it would be good to >>create a new test that fails without the patch and succeeds >>with it, and add that to the test suite. > >So, this is something new to me. We will add it as a continuation in the same thread >after this patch. I will need one information. Which test suite will we add it in? >gdb.threads or gdb.base? Also, kindly suggest a simple test case that is written that >I can see and learn. Any simple hello_world program will do. I want to understand how >that exp file is written and how it compares to tell if a test case is pass or fail. I think this would fit better into gdb.threads, given that this is about the interaction of multiple inferiors with the threading library on AIX. I'd just look at existing test cases in that directory. For simple tests, we usually have a .c file and a .exp file with the same name. The .exp file starts out with instructions to build the test case, and start it up under GDB. Then follow a series of test statements which are verified against the output of the GDB under test. As a simple example in a related area, you can look e.g. at fork-child-threads.{c,exp}. >Kindly give me feedback for this patch, incase we can do anything better >or is incorrect. Some comments: >@@ -508,14 +550,13 @@ pdc_read_data (pthdb_user_t user_current_pid, void *buf, > /* This is needed to eliminate the dependency of current thread > which is null so that thread reads the correct target memory. */ > { >- scoped_restore_current_thread restore_current_thread; >+ scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid); > /* Before the first inferior is added, we pass inferior_ptid.pid () > from pd_enable () which is 0. There is no need to switch threads > during first initialisation. In the rest of the callbacks the > current thread needs to be correct. */ > if (user_current_pid != 0) >- switch_to_thread (current_inferior ()->process_target (), >- ptid_t (user_current_pid)); >+ inferior_ptid = ptid_t (user_current_pid); > status = target_read_memory (addr, (gdb_byte *) buf, len); > } This seems unrelated to the rest of the changes at first glance. Why is this necessary? Also, is the "user_current_pid != 0" check even still needed given the change to pd_enable() below? By comparison, the Linux version of this in proc-service.c also switches the current inferior and address space: scoped_restore_current_inferior restore_inferior; set_current_inferior (ph->thread->inf); scoped_restore_current_program_space restore_current_progspace; set_current_program_space (ph->thread->inf->pspace); scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid); inferior_ptid = ph->thread->ptid; so we should probably do the same for consistency. Also, the same logic will be required in pdc_write_data, where it is currently missing completely. >+ for (thread_info *tp : all_threads (proc_target, ptid_t (pid))) >+ { >+ **(struct thread_info ***) &g = tp; >+ (*(struct thread_info ***) &g)++; >+ } This looks unnecessarily complicated. Isn't this just *g++ = tp; ? >+ /* If there is only one thread then we need not make the main >+ thread look like a thread. It can stay as a process. This >+ is useful when we have multiple inferiors, but only one is >+ threaded. So we need not make the other inferiors with only >+ main thread, look like a threaded one. For example, Thread >+ 1.1, 1.2, 2.1, 3.1 exists then it is useful to skip this for >+ loop for 2.1 and 3.1 leaving them as main process thread with >+ a dummy priv set. */ >+ >+ if (pcount == 1 && gcount == 1) >+ { >+ aix_thread_info *priv = new aix_thread_info; >+ tp = find_thread_ptid (proc_target, gptid); >+ tp->priv.reset (priv); >+ break; >+ } Is this a change in behavior to current GDB? I thought if the application (whether a single inferior or one of multiple inferiors) is threaded in the sense that it uses the libpthread library we wanted to show it as threaded, so that the user can e.g. see the thread ID in info threads. >+ /* This is to make the main process thread now look >+ like a thread. */ >+ >+ if (gptid.is_pid () && gptid.pid () == pptid.pid ()) >+ { >+ thread_change_ptid (proc_target, gptid, pptid); >+ aix_thread_info *priv = new aix_thread_info; >+ priv->pdtid = pbuf[pi].pdtid; >+ priv->tid = pbuf[pi].tid; >+ tp = find_thread_ptid (proc_target, pptid); >+ tp->priv.reset (priv); >+ pi++; >+ gi++; >+ } >+ else >+ { >+ delete_thread (gbuf[gi]); >+ gi++; >+ } This logic is still confusing me. Why is the gptid.pid () == pptid.pid () check still needed? I thought we now collected only threads of a single process to begin with, so they all ought to have the same PID? Also, if the point is the gptid.is_pid () check, this can really only happen once per inferior, as it is switched from non-threaded to threaded mode, right? Maybe it would simplify the logic to have all that (including the code under if (pcount == 1 && gcount == 1) above if it is actually needed) in a separate statement before that loop. I.e. directly before the loop, have a separate check whether the current process only has a single thread, whose ptid_t is still in the pid-only format, and if so, upgrade it to full TID format using the main thread's TID. Only after that, go through the loop to handle any other threads we may also have. (At that point, all GDB threads should already always be in TID format.) >- if (!PD_TID (ptid)) >+ if (!(ptid.tid () != 0)) That should just be "if (ptid.tid () == 0)" then. (Here and in a few other places.) >@@ -1741,7 +1823,7 @@ aix_thread_target::mourn_inferior () > { > target_ops *beneath = this->beneath (); > >- pd_deactivate (); >+ pd_disable (); > beneath->mourn_inferior (); > } Why is this necessary? If it is, do we even need two separate pd_deactivate and pd_disable routines any more? >@@ -618,6 +618,16 @@ solib_aix_bfd_open (const char *pathname) > if (member_name == bfd_get_filename (object_bfd.get ())) > break; > >+ std::string s = bfd_get_filename (object_bfd.get ()); >+ >+ /* For every inferior after first int bfd system we >+ will have the pathname instead of the member name >+ registered. Hence the below condition exists. */ >+ >+ if (s.find ('(') != std::string::npos >+ && s.find (member_name) != std::string::npos) >+ return object_bfd; Ah, I guess you also need to ensure the member_name follows immediately after the '(', otherwise there could be confusion if the member name happens to be part of the file name as well. Bye, Ulrich