From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <blarsen@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 3AE0C3858405
 for <gdb-patches@sourceware.org>; Wed, 23 Mar 2022 13:35:05 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3AE0C3858405
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-392-v6lBWB7DMFSxhPAE8ssj_A-1; Wed, 23 Mar 2022 09:35:01 -0400
X-MC-Unique: v6lBWB7DMFSxhPAE8ssj_A-1
Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com
 [10.11.54.1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 37ACB804184;
 Wed, 23 Mar 2022 13:35:01 +0000 (UTC)
Received: from [10.97.116.58] (ovpn-116-58.gru2.redhat.com [10.97.116.58])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id AC01340CF919;
 Wed, 23 Mar 2022 13:34:59 +0000 (UTC)
Message-ID: <59a6639f-4873-66b9-71f9-733fa656c4ca@redhat.com>
Date: Wed, 23 Mar 2022 10:34:56 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.7.0
Subject: Re: [PATCH v2 2/2] gdb: raise and handle NOT_AVAILABLE_ERROR when
 accessing frame PC
To: "Aktemur, Tankut Baris" <tankut.baris.aktemur@intel.com>,
 "gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
References: <cover.1644311043.git.tankut.baris.aktemur@intel.com>
 <5c408406f5dd349e47aad58d9a3129ab61c0c7fe.1644311043.git.tankut.baris.aktemur@intel.com>
 <0afbed24-06a0-d2b6-1e24-5d92e2bfb26e@redhat.com>
 <MWHPR1101MB2271B92929814AB965001AA4C4189@MWHPR1101MB2271.namprd11.prod.outlook.com>
From: Bruno Larsen <blarsen@redhat.com>
In-Reply-To: <MWHPR1101MB2271B92929814AB965001AA4C4189@MWHPR1101MB2271.namprd11.prod.outlook.com>
X-Scanned-By: MIMEDefang 2.84 on 10.11.54.1
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A,
 RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Mar 2022 13:35:07 -0000

On 3/23/22 09:55, Aktemur, Tankut Baris wrote:
> On Wednesday, March 16, 2022 6:27 PM, Bruno Larsen wrote:
>> Hello Tankut,
>>
>> On 2/8/22 06:15, Tankut Baris Aktemur via Gdb-patches wrote:
>>> This patch can be considered a continuation of
>>>
>>>     commit 4778a5f87d253399083565b4919816f541ebe414
>>>     Author: Tom de Vries <tdevries@suse.de>
>>>     Date:   Tue Apr 21 15:45:57 2020 +0200
>>>
>>>       [gdb] Fix hang after ext sigkill
>>>
>>> and
>>>
>>>     commit 47f1aceffa02be4726b854082d7587eb259136e0
>>>     Author: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
>>>     Date:   Thu May 14 13:59:54 2020 +0200
>>>
>>>       gdb/infrun: handle already-exited threads when attempting to stop
>>>
>>> If a process dies before GDB reports the exit error to the user, we
>>> may see the "Couldn't get registers: No such process." error message
>>> in various places.  For instance:
>>>
>>>     (gdb) start
>>>     ...
>>>     (gdb) info inferior
>>>       Num  Description       Connection           Executable
>>>     * 1    process 31943     1 (native)           /tmp/a.out
>>>     (gdb) shell kill -9 31943
>>>     (gdb) maintenance flush register-cache
>>>     Register cache flushed.
>>>     Couldn't get registers: No such process.
>>>     (gdb) info threads
>>>       Id   Target Id              Frame
>>>     * 1    process 31943 "a.out" Couldn't get registers: No such process.
>>>     (gdb) backtrace
>>>     Python Exception <class 'gdb.error'>: Couldn't get registers: No such process.
>>>     Couldn't get registers: No such process.
>>>     (gdb) inferior 1
>>>     Couldn't get registers: No such process.
>>>     (gdb) thread
>>>     [Current thread is 1 (process 31943)]
>>>     Couldn't get registers: No such process.
>>>     (gdb)
>>>
>>> The gdb.threads/killed-outside.exp, gdb.multi/multi-kill.exp, and
>>> gdb.multi/multi-exit.exp tests also check related scenarios.
>>>
>>> To improve the situation,
>>>
>>> 1. when printing the frame info, catch and process a NOT_AVAILABLE_ERROR.
>>>
>>> 2. when accessing the target to fetch registers, if the operation
>>>      fails, raise a NOT_AVAILABLE_ERROR instead of a generic error, so
>>>      that clients can attempt to recover accordingly.  This patch updates
>>>      the amd64_linux_nat_target and remote_target in this direction.
>>>
>>> With this patch, we obtain the following behavior:
>>>
>>>     (gdb) start
>>>     ...
>>>     (gdb) info inferior
>>>       Num  Description       Connection           Executable
>>>     * 1    process 748       1 (native)           /tmp/a.out
>>>     (gdb) shell kill -9 748
>>>     (gdb) maintenance flush register-cache
>>>     Register cache flushed.
>>>     (gdb) info threads
>>>       Id   Target Id           Frame
>>>     * 1    process 748 "a.out" <PC register is not available>
>>>     (gdb) backtrace
>>>     #0  <PC register is not available>
>>>     Backtrace stopped: not enough registers or memory available to unwind further
>>>     (gdb) inferior 1
>>>     [Switching to inferior 1 [process 748] (/tmp/a.out)]
>>>     [Switching to thread 1 (process 748)]
>>>     #0  <PC register is not available>
>>>     (gdb) thread
>>>     [Current thread is 1 (process 748)]
>>>     (gdb)
>>>
>>> Here is another "before/after" case.  Suppose we have two inferiors,
>>> each having its own remote target underneath.  Before this patch, we
>>> get the following output:
>>>
>>>     # Create two inferiors on two remote targets, resume both until
>>>     # termination.  Exit event from one of them is shown first, but the
>>>     # other also exited -- just not yet shown.
>>>     (gdb) maint set target-non-stop on
>>>     (gdb) target remote | gdbserver - ./a.out
>>>     (gdb) add-inferior -no-connection
>>>     (gdb) inferior 2
>>>     (gdb) target remote | gdbserver - ./a.out
>>>     (gdb) set schedule-multiple on
>>>     (gdb) continue
>>>     ...
>>>     [Inferior 2 (process 22127) exited normally]
>>>     (gdb) inferior 1
>>>     [Switching to inferior 1 [process 22111] (target:/tmp/a.out)]
>>>     [Switching to thread 1.1 (Thread 22111.22111)]
>>>     Could not read registers; remote failure reply 'E01'
>>>     (gdb) info threads
>>>       Id   Target Id                  Frame
>>>     * 1.1  Thread 22111.22111 "a.out" Could not read registers; remote failure reply 'E01'
>>>     (gdb) backtrace
>>>     Python Exception <class 'gdb.error'>: Could not read registers; remote failure reply
>> 'E01'
>>>     Could not read registers; remote failure reply 'E01'
>>>     (gdb) thread
>>>     [Current thread is 1.1 (Thread 22111.22111)]
>>>     Could not read registers; remote failure reply 'E01'
>>>     (gdb)
>>>
>>> With this patch, it becomes:
>>>
>>>     ...
>>>     [Inferior 1 (process 11759) exited normally]
>>>     (gdb) inferior 2
>>>     [Switching to inferior 2 [process 13440] (target:/path/to/a.out)]
>>>     [Switching to thread 2.1 (Thread 13440.13440)]
>>>     #0  <unavailable> in ?? ()
>>>     (gdb) info threads
>>>       Id   Target Id                   Frame
>>>     * 2.1  Thread 13440.13440 "a.out" <unavailable> in ?? ()
>>>     (gdb) backtrace
>>>     #0  <unavailable> in ?? ()
>>>     Backtrace stopped: not enough registers or memory available to unwind further
>>>     (gdb) thread
>>>     [Current thread is 2.1 (Thread 13440.13440)]
>>>     (gdb)
>>>
>>> Finally, together with its predecessor, this patch also fixes PR gdb/26877.
>>
>> While I think this is a good idea, it doesn't seem to fix the root cause of the bug you
>> mentioned. It does stop the crash that the bug reports, but I would say the actual issue is
>> that GDB is not noticing that the second inferior is also finished. My 2 cents, for what
>> they're worth.
> 
> The root cause was an unhandled error in a destructor.  The 2-inferior setup was
> just one way to expose it.  From https://sourceware.org/bugzilla/show_bug.cgi?id=26877#c0:
> 
> 	The problem is at:
> 
> 	#20 0x00005555561128b9 in program_space::~program_space (this=0x55555830a070, __in_chrg=<optimized out>) at gdb/progspace.c:153
> 
> 	While inside a destructor, GDB wanted to access the frame information
> 	of Inferior 2 in a series of calls.  But because the process is dead, its
> 	registers cannot be read.  This raises an error inside a destructor, leading
> 	to termination of GDB.
> 
>  From that perspective, I think the root cause is fixed.
> 

It is the root cause of the crash, but GDB is still not aware that the second inferior has finished, which is still a problem, IMHO.
Regardless, it looks like a good cleanup anyway, since even if we fix GDB noticing the process finishing, this crash could happen for other situations.


>> The explanation in the commit message is great! It explains the problem quite well, I just
>> don't understand why you only changed amd64_linux_nat_target and remote. I imagine this
>> issue happens with all targets. I'd ask at least that some of the most common ones be
>> changed and validated.
> 
> Those two targets are the ones I can test reliably.  For the others,
> unfortunately I don't have a reliable way of regression-testing.
> 

Ah, then I guess we should keep this fix in mind if someone reports a similar bug.

>> Also, some extra testing revealed that the previous patch is not
>> actually necessary to fix the crash.
> 
> That's possible.  The bug report in PR/26877 was just a starting point
> for the submitted patches, which aim at addressing a more general problem.
>   
>> As for technical review, I don't have any questions or comments, but I can't approve
>> patches.
>   
> Thanks for your comments!
> 
> Regards
> -Baris
> 
> 
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
> Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928


-- 
Cheers!
Bruno Larsen