From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id BFBCB3858018 for ; Wed, 6 Jul 2022 17:50:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BFBCB3858018 Received: from [172.16.0.95] (192-222-180-24.qc.cable.ebox.net [192.222.180.24]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 1F7D61E220; Wed, 6 Jul 2022 13:50:55 -0400 (EDT) Message-ID: Date: Wed, 6 Jul 2022 13:50:54 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: Fw: RE: [PATCH] Fix assert pid != 0 assertion failure in AIX Content-Language: fr To: Aditya Vidyadhar Kamath , Ulrich Weigand , Joel Brobecker via Gdb-patches Cc: Sangamesh Mallayya References: <5f142468-bc68-9128-d4d6-80cf36f12a48@polymtl.ca> <87169b93-8be2-5ccd-6b58-51b395a367bd@polymtl.ca> <4516dbf7-2655-39c5-0614-8235df05248e@polymtl.ca> <0ad5c21e-60fa-e52d-f70c-d2bc62e0ac74@polymtl.ca> From: Simon Marchi In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2022 17:50:59 -0000 On 7/6/22 00:25, Aditya Vidyadhar Kamath wrote: > > Morning Simon. > > The reason we were adding one more inferior_ptid!= 0 , condition is the previous condition in the and logic i.e. pid != inferior_ptd.pid() will satisfy as -1 is not equal to 0. [inferior_ptid is set to null before coming into wait]. So, in the next iteration since the process has exited waitpid (), will lead to ERRCHILD error though the current iteration fetched the right pid using waitpid (). > > However, we get your point that inferior_ptid should not be used initially [for any condition check till we fetch the pid using waitpid ()] as it is being reset. > > Please find attached our modified patch where we do a check of the inferior being in the list. I hope this solution matches to what you suggested. > > [See 0001-Fix-gdb_assert-pid-0-assertion-failure-in-AIX.patch file attached to this email] > > Have a great day. > > Thanks and regards, > Aditya. > > From a1c5ab5338a5d46eab675a85c28a9b00256d395a Mon Sep 17 00:00:00 2001 > From: "aditya@ibm" > Date: Tue, 5 Jul 2022 23:05:18 -0500 > Subject: [PATCH] Fix gdb_assert (pid != 0); assertion failure in AIX > > --- > gdb/aix-thread.c | 2 ++ > gdb/rs6000-aix-nat.c | 4 ++-- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c > index ecd8200b692..e5c287a3fad 100644 > --- a/gdb/aix-thread.c > +++ b/gdb/aix-thread.c > @@ -1091,6 +1091,8 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status, > if (ptid.pid () == -1) > return ptid_t (-1); > > + inferior_ptid = ptid; I get why you are doing this, because the other functions (pd_update and friends) then use it. However, it would be nicer to just change them all to not use inferior_ptid, but take whatever information is needed through parameters. See below. > + > /* Check whether libpthdebug might be ready to be initialized. */ > if (!pd_active && status->kind () == TARGET_WAITKIND_STOPPED > && status->sig () == GDB_SIGNAL_TRAP) > diff --git a/gdb/rs6000-aix-nat.c b/gdb/rs6000-aix-nat.c > index 8563aea313a..24071a3742f 100644 > --- a/gdb/rs6000-aix-nat.c > +++ b/gdb/rs6000-aix-nat.c > @@ -525,11 +525,11 @@ rs6000_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus, > > /* Claim it exited with unknown signal. */ > ourstatus->set_signalled (GDB_SIGNAL_UNKNOWN); > - return inferior_ptid; > + return ptid_t(pid); This is not right, as we return a "signalled" event with a minus_one_ptid (pid is -1 here). This is unexpected to the core of GDB, because a "signalled" event means that some inferior has received a fatal signal. So the returned ptid should say which inferior received the signal. The code in rs6000_nat_target::wait appears to have been copied from inf_ptrace_target::wait (the base class of rs6000_nat_target). In inf_ptrace_target::wait, that snippet has been changed to return an "ignore" status in that case, so I suppose we should to that here. The change in inf_ptrace_target::wait was done here: https://gitlab.com/gnutools/binutils-gdb/-/commit/85e8c48c73a5c39a6980f9b2bd16ec96062fc4c3 See my patch below. > } > > /* Ignore terminated detached child processes. */ > - if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ()) > + if (!WIFSTOPPED (status) && find_inferior_pid(this,pid) == NULL) I think this is correct. Just make sure to add spaces where appropriate. And we prefer nullptr over NULL for new code. See my patch below. I managed to build GDB on gcc119, on the GCC compile farm and wrote the patch below that at least fix things enough to be able to debug a simple program. I tried a multi-threaded program, and while gdb did not crash, I was not able to see the multiple threads, so there's more work to do. But at least, this should be a good starting point. Please let me know what you think. Simon >From e9e35416a45a8454dc87cabf9462e6cf4040d088 Mon Sep 17 00:00:00 2001 From: Simon Marchi Date: Wed, 6 Jul 2022 13:39:22 -0400 Subject: [PATCH] gdb: fix {rs6000_nat_target,aix_thread_target}::wait to not use inferior_ptid Trying to run a simple program (empty main) on AIX, I get: (gdb) run Starting program: /scratch/simark/build/gdb/a.out Child process unexpectedly missing: There are no child processes.. ../../src/binutils-gdb/gdb/inferior.c:304: internal-error: find_inferior_pid: Assertion `pid != 0' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- 0x10ef12a8 gdb_internal_backtrace_1() ../../src/binutils-gdb/gdb/bt-utils.c:122 0x10ef1470 gdb_internal_backtrace() ../../src/binutils-gdb/gdb/bt-utils.c:168 0x1004d368 internal_vproblem(internal_problem*, char const*, int, char const*, char*) ../../src/binutils-gdb/gdb/utils.c:396 0x1004d8a8 internal_verror(char const*, int, char const*, char*) ../../src/binutils-gdb/gdb/utils.c:476 0x1004c424 internal_error(char const*, int, char const*, ...) ../../src/binutils-gdb/gdbsupport/errors.cc:55 0x102ab344 find_inferior_pid(process_stratum_target*, int) ../../src/binutils-gdb/gdb/inferior.c:304 0x102ab4a4 find_inferior_ptid(process_stratum_target*, ptid_t) ../../src/binutils-gdb/gdb/inferior.c:318 0x1061bae8 find_thread_ptid(process_stratum_target*, ptid_t) ../../src/binutils-gdb/gdb/thread.c:519 0x10319e98 handle_inferior_event(execution_control_state*) ../../src/binutils-gdb/gdb/infrun.c:5532 0x10315544 fetch_inferior_event() ../../src/binutils-gdb/gdb/infrun.c:4221 0x10952e34 inferior_event_handler(inferior_event_type) ../../src/binutils-gdb/gdb/inf-loop.c:41 0x1032640c infrun_async_inferior_event_handler(void*) ../../src/binutils-gdb/gdb/infrun.c:9548 0x10673188 check_async_event_handlers() ../../src/binutils-gdb/gdb/async-event.c:335 0x1066fce4 gdb_do_one_event() ../../src/binutils-gdb/gdbsupport/event-loop.cc:214 0x10001a94 start_event_loop() ../../src/binutils-gdb/gdb/main.c:411 0x10001ca0 captured_command_loop() ../../src/binutils-gdb/gdb/main.c:471 0x10003d74 captured_main(void*) ../../src/binutils-gdb/gdb/main.c:1329 0x10003e48 gdb_main(captured_main_args*) ../../src/binutils-gdb/gdb/main.c:1344 0x10000744 main ../../src/binutils-gdb/gdb/gdb.c:32 --------------------- ../../src/binutils-gdb/gdb/inferior.c:304: internal-error: find_inferior_pid: Assertion `pid != 0' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) This is due to some bit-rot in the AIX port, still relying on the entry value of inferior_ptid in the wait methods. Problem #1 is in rs6000_nat_target::wait, here: /* Ignore terminated detached child processes. */ if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ()) pid = -1; At this point, waitpid has returned an "exited" status for some pid, so pid is non-zero. Since inferior_ptid is set to null_ptid on entry, the pid returned by wait is not equal to `inferior_ptid.pid ()`, so we reset pid to -1 and go to waiting again. Since there are not more children to wait for, waitpid then returns -1 so we get here: if (pid == -1) { gdb_printf (gdb_stderr, _("Child process unexpectedly missing: %s.\n"), safe_strerror (save_errno)); /* Claim it exited with unknown signal. */ ourstatus->set_signalled (GDB_SIGNAL_UNKNOWN); return inferior_ptid; } We therefore return a "signalled" status with a null_ptid (again, inferior_ptid is null_ptid). This confuses infrun, because if the target returns a "signalled" status, it should be coupled with a ptid for an inferior that exists. So, the first step is to fix the snippets above to not use inferior_ptid. In the first snippet, use find_inferior_pid to see if we know the event process. If there is no inferior with that pid, we assume it's a detached child process to we ignore the event. That should be enough to fix the problem, because it should make it so we won't go into the second snippet. But still, fix the second snippet to return an "ignore" status. This is copied from inf_ptrace_target::wait, which is where rs6000_nat_target::wait appears to be copied from in the first place. These changes, are not sufficient, as the aix_thread_target, which sits on top of rs6000_nat_target, also relies on inferior_ptid. aix_thread_target::wait, by calling pd_update, assumes that rs6000_nat_target has set inferior_ptid to the appropriate value (the ptid of the event thread), but that's not the case. pd_update returns inferior_ptid - null_ptid - and therefore aix_thread_target::wait returns null_ptid too, and we still hit the assert shown above. Fix this by changing pd_activate, pd_update, sync_threadlists and get_signaled_thread to all avoid using inferior_ptid. Instead, they accept as a parameter the pid of the process we are working on. With this patch, I am able to run the program to completion: (gdb) r Starting program: /scratch/simark/build/gdb/a.out [Inferior 1 (process 11010794) exited normally] As well as break on main: (gdb) b main Breakpoint 1 at 0x1000036c (gdb) r Starting program: /scratch/simark/build/gdb/a.out Breakpoint 1, 0x1000036c in main () (gdb) c Continuing. [Inferior 1 (process 26083688) exited normally] Change-Id: I7c2613bbefe487d75fa1a0c0994423471d961ee9 --- gdb/aix-thread.c | 59 +++++++++++++++++++++----------------------- gdb/rs6000-aix-nat.c | 7 +++--- 2 files changed, 31 insertions(+), 35 deletions(-) diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c index ecd8200b6928..d47f5132592a 100644 --- a/gdb/aix-thread.c +++ b/gdb/aix-thread.c @@ -701,14 +701,14 @@ gcmp (const void *t1v, const void *t2v) Return 0 if none found. */ static pthdb_tid_t -get_signaled_thread (void) +get_signaled_thread (int pid) { struct thrdsinfo64 thrinf; tid_t ktid = 0; while (1) { - if (getthrds (inferior_ptid.pid (), &thrinf, + if (getthrds (pid, &thrinf, sizeof (thrinf), &ktid, 1) != 1) break; @@ -734,9 +734,9 @@ get_signaled_thread (void) have difficulty with certain call patterns */ static void -sync_threadlists (void) +sync_threadlists (int pid) { - int cmd, status, infpid; + int cmd, status; int pcount, psize, pi, gcount, gi; struct pd_thread *pbuf; struct thread_info **gbuf, **g, *thread; @@ -790,8 +790,6 @@ sync_threadlists (void) qsort (gbuf, gcount, sizeof *gbuf, gcmp); /* Apply differences between the two arrays to GDB's thread list. */ - - infpid = inferior_ptid.pid (); for (pi = gi = 0; pi < pcount || gi < gcount;) { if (pi == pcount) @@ -808,7 +806,7 @@ sync_threadlists (void) process_stratum_target *proc_target = current_inferior ()->process_target (); thread = add_thread_with_info (proc_target, - ptid_t (infpid, 0, pbuf[pi].pthid), + ptid_t (pid, 0, pbuf[pi].pthid), priv); pi++; @@ -818,7 +816,7 @@ sync_threadlists (void) ptid_t pptid, gptid; int cmp_result; - pptid = ptid_t (infpid, 0, pbuf[pi].pthid); + pptid = ptid_t (pid, 0, pbuf[pi].pthid); gptid = gbuf[gi]->ptid; pdtid = pbuf[pi].pdtid; tid = pbuf[pi].tid; @@ -872,10 +870,11 @@ iter_tid (struct thread_info *thread, void *tidp) /* Synchronize libpthdebug's state with the inferior and with GDB, generate a composite process/thread for the current thread, - set inferior_ptid to if SET_INFPID, and return . */ + Return the ptid of the event thread if one can be found, else + return a pid-only ptid with PID. */ static ptid_t -pd_update (int set_infpid) +pd_update (int pid) { int status; ptid_t ptid; @@ -883,36 +882,33 @@ pd_update (int set_infpid) struct thread_info *thread = NULL; if (!pd_active) - return inferior_ptid; + return ptid_t (pid); status = pthdb_session_update (pd_session); if (status != PTHDB_SUCCESS) - return inferior_ptid; + return ptid_t (pid); - sync_threadlists (); + sync_threadlists (pid); /* Define "current thread" as one that just received a trap signal. */ - tid = get_signaled_thread (); + tid = get_signaled_thread (pid); if (tid != 0) thread = iterate_over_threads (iter_tid, &tid); if (!thread) - ptid = inferior_ptid; + ptid = ptid_t (pid); else - { - ptid = thread->ptid; - if (set_infpid) - switch_to_thread (thread); - } + ptid = thread->ptid; + return ptid; } /* Try to start debugging threads in the current process. - If successful and SET_INFPID, set inferior_ptid to reflect the - current thread. */ + If successful and there exists and we can find an event thread, return a ptid + for that thread. Otherwise, return a ptid-only ptid using PID. */ static ptid_t -pd_activate (int set_infpid) +pd_activate (int pid) { int status; @@ -921,10 +917,10 @@ pd_activate (int set_infpid) &pd_session); if (status != PTHDB_SUCCESS) { - return inferior_ptid; + return ptid_t (pid); } pd_active = 1; - return pd_update (set_infpid); + return pd_update (pid); } /* Undo the effects of pd_activate(). */ @@ -1080,17 +1076,18 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status, target_wait_flags options) { { - scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid); - pid_to_prc (&ptid); - inferior_ptid = ptid_t (inferior_ptid.pid ()); ptid = beneath ()->wait (ptid, status, options); } if (ptid.pid () == -1) return ptid_t (-1); + /* The target beneath does not deal with threads, so it should only return + pid-only ptids. */ + gdb_assert (ptid.is_pid ()); + /* Check whether libpthdebug might be ready to be initialized. */ if (!pd_active && status->kind () == TARGET_WAITKIND_STOPPED && status->sig () == GDB_SIGNAL_TRAP) @@ -1102,10 +1099,10 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status, if (regcache_read_pc (regcache) - gdbarch_decr_pc_after_break (gdbarch) == pd_brk_addr) - return pd_activate (0); + return pd_activate (ptid.pid ()); } - return pd_update (0); + return pd_update (ptid.pid ()); } /* Record that the 64-bit general-purpose registers contain VALS. */ @@ -1765,7 +1762,7 @@ aix_thread_target::pid_to_str (ptid_t ptid) if (!PD_TID (ptid)) return beneath ()->pid_to_str (ptid); - return string_printf (_("Thread %ld"), ptid.tid ()); + return string_printf (_("Thread %s"), pulongest (ptid.tid ())); } /* Return a printable representation of extra information about diff --git a/gdb/rs6000-aix-nat.c b/gdb/rs6000-aix-nat.c index 8563aea313a2..f604f7d503e9 100644 --- a/gdb/rs6000-aix-nat.c +++ b/gdb/rs6000-aix-nat.c @@ -523,13 +523,12 @@ rs6000_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus, _("Child process unexpectedly missing: %s.\n"), safe_strerror (save_errno)); - /* Claim it exited with unknown signal. */ - ourstatus->set_signalled (GDB_SIGNAL_UNKNOWN); - return inferior_ptid; + ourstatus->set_ignore (); + return minus_one_ptid; } /* Ignore terminated detached child processes. */ - if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ()) + if (!WIFSTOPPED (status) && find_inferior_pid (this, pid) == nullptr) pid = -1; } while (pid == -1); -- 2.36.1