From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 3AE0C3858405 for ; Wed, 23 Mar 2022 13:35:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3AE0C3858405 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-392-v6lBWB7DMFSxhPAE8ssj_A-1; Wed, 23 Mar 2022 09:35:01 -0400 X-MC-Unique: v6lBWB7DMFSxhPAE8ssj_A-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 37ACB804184; Wed, 23 Mar 2022 13:35:01 +0000 (UTC) Received: from [10.97.116.58] (ovpn-116-58.gru2.redhat.com [10.97.116.58]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AC01340CF919; Wed, 23 Mar 2022 13:34:59 +0000 (UTC) Message-ID: <59a6639f-4873-66b9-71f9-733fa656c4ca@redhat.com> Date: Wed, 23 Mar 2022 10:34:56 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v2 2/2] gdb: raise and handle NOT_AVAILABLE_ERROR when accessing frame PC To: "Aktemur, Tankut Baris" , "gdb-patches@sourceware.org" References: <5c408406f5dd349e47aad58d9a3129ab61c0c7fe.1644311043.git.tankut.baris.aktemur@intel.com> <0afbed24-06a0-d2b6-1e24-5d92e2bfb26e@redhat.com> From: Bruno Larsen In-Reply-To: X-Scanned-By: MIMEDefang 2.84 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2022 13:35:07 -0000 On 3/23/22 09:55, Aktemur, Tankut Baris wrote: > On Wednesday, March 16, 2022 6:27 PM, Bruno Larsen wrote: >> Hello Tankut, >> >> On 2/8/22 06:15, Tankut Baris Aktemur via Gdb-patches wrote: >>> This patch can be considered a continuation of >>> >>> commit 4778a5f87d253399083565b4919816f541ebe414 >>> Author: Tom de Vries >>> Date: Tue Apr 21 15:45:57 2020 +0200 >>> >>> [gdb] Fix hang after ext sigkill >>> >>> and >>> >>> commit 47f1aceffa02be4726b854082d7587eb259136e0 >>> Author: Tankut Baris Aktemur >>> Date: Thu May 14 13:59:54 2020 +0200 >>> >>> gdb/infrun: handle already-exited threads when attempting to stop >>> >>> If a process dies before GDB reports the exit error to the user, we >>> may see the "Couldn't get registers: No such process." error message >>> in various places. For instance: >>> >>> (gdb) start >>> ... >>> (gdb) info inferior >>> Num Description Connection Executable >>> * 1 process 31943 1 (native) /tmp/a.out >>> (gdb) shell kill -9 31943 >>> (gdb) maintenance flush register-cache >>> Register cache flushed. >>> Couldn't get registers: No such process. >>> (gdb) info threads >>> Id Target Id Frame >>> * 1 process 31943 "a.out" Couldn't get registers: No such process. >>> (gdb) backtrace >>> Python Exception : Couldn't get registers: No such process. >>> Couldn't get registers: No such process. >>> (gdb) inferior 1 >>> Couldn't get registers: No such process. >>> (gdb) thread >>> [Current thread is 1 (process 31943)] >>> Couldn't get registers: No such process. >>> (gdb) >>> >>> The gdb.threads/killed-outside.exp, gdb.multi/multi-kill.exp, and >>> gdb.multi/multi-exit.exp tests also check related scenarios. >>> >>> To improve the situation, >>> >>> 1. when printing the frame info, catch and process a NOT_AVAILABLE_ERROR. >>> >>> 2. when accessing the target to fetch registers, if the operation >>> fails, raise a NOT_AVAILABLE_ERROR instead of a generic error, so >>> that clients can attempt to recover accordingly. This patch updates >>> the amd64_linux_nat_target and remote_target in this direction. >>> >>> With this patch, we obtain the following behavior: >>> >>> (gdb) start >>> ... >>> (gdb) info inferior >>> Num Description Connection Executable >>> * 1 process 748 1 (native) /tmp/a.out >>> (gdb) shell kill -9 748 >>> (gdb) maintenance flush register-cache >>> Register cache flushed. >>> (gdb) info threads >>> Id Target Id Frame >>> * 1 process 748 "a.out" >>> (gdb) backtrace >>> #0 >>> Backtrace stopped: not enough registers or memory available to unwind further >>> (gdb) inferior 1 >>> [Switching to inferior 1 [process 748] (/tmp/a.out)] >>> [Switching to thread 1 (process 748)] >>> #0 >>> (gdb) thread >>> [Current thread is 1 (process 748)] >>> (gdb) >>> >>> Here is another "before/after" case. Suppose we have two inferiors, >>> each having its own remote target underneath. Before this patch, we >>> get the following output: >>> >>> # Create two inferiors on two remote targets, resume both until >>> # termination. Exit event from one of them is shown first, but the >>> # other also exited -- just not yet shown. >>> (gdb) maint set target-non-stop on >>> (gdb) target remote | gdbserver - ./a.out >>> (gdb) add-inferior -no-connection >>> (gdb) inferior 2 >>> (gdb) target remote | gdbserver - ./a.out >>> (gdb) set schedule-multiple on >>> (gdb) continue >>> ... >>> [Inferior 2 (process 22127) exited normally] >>> (gdb) inferior 1 >>> [Switching to inferior 1 [process 22111] (target:/tmp/a.out)] >>> [Switching to thread 1.1 (Thread 22111.22111)] >>> Could not read registers; remote failure reply 'E01' >>> (gdb) info threads >>> Id Target Id Frame >>> * 1.1 Thread 22111.22111 "a.out" Could not read registers; remote failure reply 'E01' >>> (gdb) backtrace >>> Python Exception : Could not read registers; remote failure reply >> 'E01' >>> Could not read registers; remote failure reply 'E01' >>> (gdb) thread >>> [Current thread is 1.1 (Thread 22111.22111)] >>> Could not read registers; remote failure reply 'E01' >>> (gdb) >>> >>> With this patch, it becomes: >>> >>> ... >>> [Inferior 1 (process 11759) exited normally] >>> (gdb) inferior 2 >>> [Switching to inferior 2 [process 13440] (target:/path/to/a.out)] >>> [Switching to thread 2.1 (Thread 13440.13440)] >>> #0 in ?? () >>> (gdb) info threads >>> Id Target Id Frame >>> * 2.1 Thread 13440.13440 "a.out" in ?? () >>> (gdb) backtrace >>> #0 in ?? () >>> Backtrace stopped: not enough registers or memory available to unwind further >>> (gdb) thread >>> [Current thread is 2.1 (Thread 13440.13440)] >>> (gdb) >>> >>> Finally, together with its predecessor, this patch also fixes PR gdb/26877. >> >> While I think this is a good idea, it doesn't seem to fix the root cause of the bug you >> mentioned. It does stop the crash that the bug reports, but I would say the actual issue is >> that GDB is not noticing that the second inferior is also finished. My 2 cents, for what >> they're worth. > > The root cause was an unhandled error in a destructor. The 2-inferior setup was > just one way to expose it. From https://sourceware.org/bugzilla/show_bug.cgi?id=26877#c0: > > The problem is at: > > #20 0x00005555561128b9 in program_space::~program_space (this=0x55555830a070, __in_chrg=) at gdb/progspace.c:153 > > While inside a destructor, GDB wanted to access the frame information > of Inferior 2 in a series of calls. But because the process is dead, its > registers cannot be read. This raises an error inside a destructor, leading > to termination of GDB. > > From that perspective, I think the root cause is fixed. > It is the root cause of the crash, but GDB is still not aware that the second inferior has finished, which is still a problem, IMHO. Regardless, it looks like a good cleanup anyway, since even if we fix GDB noticing the process finishing, this crash could happen for other situations. >> The explanation in the commit message is great! It explains the problem quite well, I just >> don't understand why you only changed amd64_linux_nat_target and remote. I imagine this >> issue happens with all targets. I'd ask at least that some of the most common ones be >> changed and validated. > > Those two targets are the ones I can test reliably. For the others, > unfortunately I don't have a reliable way of regression-testing. > Ah, then I guess we should keep this fix in mind if someone reports a similar bug. >> Also, some extra testing revealed that the previous patch is not >> actually necessary to fix the crash. > > That's possible. The bug report in PR/26877 was just a starting point > for the submitted patches, which aim at addressing a more general problem. > >> As for technical review, I don't have any questions or comments, but I can't approve >> patches. > > Thanks for your comments! > > Regards > -Baris > > > Intel Deutschland GmbH > Registered Address: Am Campeon 10, 85579 Neubiberg, Germany > Tel: +49 89 99 8853-0, www.intel.de > Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva > Chairperson of the Supervisory Board: Nicole Lau > Registered Office: Munich > Commercial Register: Amtsgericht Muenchen HRB 186928 -- Cheers! Bruno Larsen