From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2065.outbound.protection.outlook.com [40.107.93.65]) by sourceware.org (Postfix) with ESMTPS id 346BA386D637 for ; Fri, 15 Dec 2023 10:51:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 346BA386D637 Authentication-Results: sourceware.org; dmarc=fail (p=quarantine dis=none) header.from=amd.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=amd.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 346BA386D637 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.93.65 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1702637513; cv=pass; b=VABm4+Z5o5IgJO5bCwTwLyME/ebyOy1Yi6flTTri3txBZCm2yHTTobyGCtPMW6euUlAsRxawDlJl1vybdzo0nRXeQRM1or/PanTPIpRzN9KCxqzPDQoHDwQNgMaLha+y9DA8sqMhO17dnQ+QDnow5nSDPVaci/9oP1s4ODpuekM= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1702637513; c=relaxed/simple; bh=ro1tpNb5+RKkxoOhybiSMJ0bLNtwmyxJs3CDZK4Vg+A=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=xoCnvx+Rb2ZEv9JzW+q0MUYFu5uYCBf/0T3UNi7Cg78EKd9zpaXxEw4FGGk73CpY68tHigv+LMBAx99Iu+mbGfO3nRimP14kCWvnaKiiFHaj67/ef8evVl9+HCw5Vg1ET9NYpJ4xErznCkVUYgN5CetNby/l3oNqWJZq2LSUdbM= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SAu1ziuBf8hcs8lrdz7LJ14gcpLVS5ftjtq8zy/uZt+zojd7RtiRFYvI00YYbCr/1Il7MiesHbbbHuHNOEdu9cgAe/ioVNZGr+cHeKDmsJ/YSQsSpdvXswiGuQskghcNMKVHKoAlOg6DO/pGywT7KLkHxBt3j14HMoZpmkFpu/wOXrnv7S7JG52k5r5Co3lmOde8YO7R58HWFv2prPNkugr5Y7SeKLBbbo7y/JeSHQOoyiO+pfUl7bRcY0HxDH0GcLik4UL3rHx1PIguKnukIf1E7ypd4OOYI75gLvgYVheM8SRaOdZuzR8K9PExw0Jwbc0Xtlyjjh31Hb0JnCVQ2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=USjRUTKxYmICW0c6PTD0ZryaRVBuCncdsGHoz6U3tXw=; b=bBWdw/K4KUFx5cUfsEcEm3xoCkGn+21Yz72QMUgfnzXRjw2Gl5caoh6JAXyZW/bhS6eJWK2GhBN42Fz/8aKXvTSZkVbw/LNCFVR2q2ve+WeSqCS7Ey2AT9Pp9YM2s/RVVscbzxkOvne7ATqBF5BswS1H1QtLzEkJ1Qhu7nADkZh/KWTzVRc9Wqh1M3V7/cIk30GzqmrTxz3JHtPEpD+CzsXKIP+J1pmp9cJw3jW5llrvD4tiXt30o/RSKyshWoSzz+rbfnZmg2ETuy5Vz5yFZ/cc4/lcuNo/2Cw9EPAv2D8nNG6ssYSI94rk6gH/lLSQkVxI/djSLpd60GYw6qv38A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=palves.net smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=USjRUTKxYmICW0c6PTD0ZryaRVBuCncdsGHoz6U3tXw=; b=JVOOZz0dPBki57r/eWqvl4xbErVuVCpq9Z0Qmuo4bMFenkni3SlCqIWj2Kn5q1DHj34RlLnbSrfIoS3yaAfe+8rjyLwf1jHaPxcFkQ4M9Ukh13NI2JMq1lLA+JKqI3t9+StqBl3gzHCACwD4eukb3oF80Mgm13idmeBHE5rRKTE= Received: from MN2PR18CA0007.namprd18.prod.outlook.com (2603:10b6:208:23c::12) by LV8PR12MB9449.namprd12.prod.outlook.com (2603:10b6:408:204::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7091.28; Fri, 15 Dec 2023 10:51:40 +0000 Received: from BL02EPF0001A105.namprd05.prod.outlook.com (2603:10b6:208:23c:cafe::aa) by MN2PR18CA0007.outlook.office365.com (2603:10b6:208:23c::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7091.26 via Frontend Transport; Fri, 15 Dec 2023 10:51:40 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BL02EPF0001A105.mail.protection.outlook.com (10.167.241.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7113.14 via Frontend Transport; Fri, 15 Dec 2023 10:51:40 +0000 Received: from khazad-dum (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Fri, 15 Dec 2023 04:51:38 -0600 Date: Fri, 15 Dec 2023 10:51:30 +0000 From: Lancelot SIX To: Pedro Alves CC: Subject: Re: [PATCH 6/8] Fix handling of vanishing threads that were stepping/stopping Message-ID: <20231215105130.kjig2ad2q6tctkkj@khazad-dum> References: <20231214202238.1065676-1-pedro@palves.net> <20231214202238.1065676-7-pedro@palves.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20231214202238.1065676-7-pedro@palves.net> X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL02EPF0001A105:EE_|LV8PR12MB9449:EE_ X-MS-Office365-Filtering-Correlation-Id: 8931022b-baeb-4610-6123-08dbfd5bd1ae X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: WeywBpC+buyxcY0Yzd/4TFRuxkntiffEUnwFV2Z7ZVm/J4nxABOeH3Ji3wW+J1SgtaBfuYklA8ykGg+TffyUXa01/XB2GIoMoelkkalkmo6NyyWtbd6KScZ0k29p2ZByU5t4mTWzcUtJPkSgyOV3BUM3MC+3AfbgI1WjkR2BHH6dEmImjzgMmo+WxskmLNW34QpWT8WTI5DaI3kUSmGmcXXxSYJ3FO3zWfw/ctRKtCjeHfaUAeyjfu+mq/ZDzeAkj/NWTJ9o6DJQXNqBaYnUeuSol0DnXkjidKwYsD5zSu2K3jfvDAr0k5Tq1FnaES1rtxL7a3eHFtoBNMoIlFR5UErZT30hU7CLI8aXVTQauHMR/Fi8P5BhchYATC88VC034XuCqt3OjHwYwhYJUaX4tsSCoKyM9sgHMneADXnfrImQTemmMw3LrxoFgR8eHWomgSXN5nycmQeD8UTslakWkQYtnUhXZggM4Yc+00RlhSJ8kiGfZCNjQR4a07eBpsu3qFmOy5dzk6hYiOmwWcedb7ilvtw761/2PoWh4reKaS0qTO5HKH+iZpy16VmfwJgbtpLDptAu7EV5OWxMJLWTT+mmg1Dd6+VGfStZgLHtOL4UE27e+Q2BnwrrjvS5hQzf2t1W5SZMEhYt7Lxq8EGXxe9w+otxYfmnO+6hXQxBuZ0gGMwBjvig6EJp0JLYx4eOFahV4UxxzUy468SObAConliwyNoR0YwQo61ycuHNN/l4hD7iYUAej7mtQo8j0hXPA/TOT6J/CcMAjTGjghLMiw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(39860400002)(346002)(376002)(396003)(136003)(230922051799003)(1800799012)(186009)(64100799003)(451199024)(82310400011)(46966006)(36840700001)(40470700004)(36860700001)(47076005)(55016003)(40460700003)(8936002)(2906002)(5660300002)(30864003)(33716001)(356005)(83380400001)(40480700001)(81166007)(82740400003)(86362001)(26005)(316002)(426003)(1076003)(336012)(16526019)(4326008)(70206006)(70586007)(6916009)(41300700001)(8676002)(9686003)(478600001)(6666004)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Dec 2023 10:51:40.4305 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8931022b-baeb-4610-6123-08dbfd5bd1ae X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BL02EPF0001A105.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV8PR12MB9449 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Pedro, I have a minor comment below. On Thu, Dec 14, 2023 at 08:22:36PM +0000, Pedro Alves wrote: > Downstream, AMD is carrying a testcase > (gdb.rocm/continue-over-kernel-exit.exp) that exposes a couple issues > with the amd-dbgapi target's handling of exited threads. The test > can't be added upstream yet, unfortunately, due to dependency on DWARF > extensions that can't be upstreamed yet. However, it can be found on > the mailing list on the same series as this patch. > > The test spawns a kernel with a number of waves. The waves do nothing > but exit. There is a breakpoint on the s_endpgm instruction. Once > that breakpoint is hit, the test issues a "continue" command. We > should see one breakpoint hit per wave, and then the whole program > exiting. We do see that, however we also see this: > > [New AMDGPU Wave ?:?:?:1 (?,?,?)/?] > [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited] > *repeat for other waves* > ... > [Thread 0x7ffff626f640 (LWP 3048491) exited] > [Thread 0x7fffeb7ff640 (LWP 3048488) exited] > [Inferior 1 (process 3048475) exited normally] > > That "New AMDGPU Wave" output comes from infrun.c itself adding the > thread to the GDB thread list, because it got an event for a thread > not on the thread list yet. The output shows "?"s instead of proper > coordinates, because the event was a TARGET_WAITKIND_THREAD_EXITED, > i.e., the wave was already gone when infrun.c added the thread to the > thread list. > > That shouldn't ever happen for the amd-dbgapi target, threads should > only ever be added by the backend. > > Note "New AMDGPU Wave ?:?:?:1" is for wave 1. What happened was that > wave 1 terminated previously, and a previous call to > amd_dbgapi_target::update_thread_list() noticed the wave had vanished > and removed it from the GDB thread list. However, because the wave > was stepping when it terminated (due to the displaced step over the > s_endpgm) instruction, it is guaranteed that the amd-dbgapi library > queues a WAVE_COMMAND_TERMINATED event for the exit. > > When we process that WAVE_COMMAND_TERMINATED event, in > amd-dbgapi-target.c:process_one_event, we return it to the core as a > TARGET_WAITKIND_THREAD_EXITED event: > > static void > process_one_event (amd_dbgapi_event_id_t event_id, > amd_dbgapi_event_kind_t event_kind) > { > ... > if (status == AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID > && event_kind == AMD_DBGAPI_EVENT_KIND_WAVE_COMMAND_TERMINATED) > ws.set_thread_exited (0); > ... > } > > Recall the wave is already gone from the GDB thread list. So when GDB > sees that TARGET_WAITKIND_THREAD_EXITED event for a thread it doesn't > know about, it adds the thread to the thread list, resulting in that: > > [New AMDGPU Wave ?:?:?:1 (?,?,?)/?] > > and then, because it was a TARGET_WAITKIND_THREAD_EXITED event, GDB > marks the thread exited right afterwards: > > [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited] > > The fix is to make amd_dbgapi_target::update_thread_list() _not_ > delete vanishing waves iff they were stepping or in progress of being > stopped. These two cases are the ones dbgapi guarantees will result > in a WAVE_COMMAND_TERMINATED event if the wave terminates: > > /** > * A command for a wave was not able to complete because the wave has > * terminated. > * > * Commands that can result in this event are ::amd_dbgapi_wave_stop and > * ::amd_dbgapi_wave_resume in single step mode. Since the wave terminated > * before stopping, this event will be reported instead of > * ::AMD_DBGAPI_EVENT_KIND_WAVE_STOP. > * > * The wave that terminated is available by the ::AMD_DBGAPI_EVENT_INFO_WAVE > * query. However, the wave will be invalid since it has already terminated. > * It is the client's responsibility to know what command was being performed > * and was unable to complete due to the wave terminating. > */ > AMD_DBGAPI_EVENT_KIND_WAVE_COMMAND_TERMINATED = 2, > > As the comment says, it's GDB's responsability to know whether the > wave was stepping or being stopped. Since we now have a wave_info map > with one entry for each wave, that seems like the place to store that > information. However, I still decided to put all the coordinate > information in its own structure. I.e., basically renamed the > existing wave_info to wave_coordinates, and then added a new wave_info > structure that holds the new state, plus a wave_coordinates object. > This seemer cleaner as there are places where we only need to s/seemer/seemed/ maybe? > instantiate a wave_coordinates object. > > There's an extra twist. The testcase also exercises stopping at a new > kernel right after the first kernel fully exits. In that scenario, we > were hitting this assertion after the first kernel fully exits and the > hit of the breakpoint at the second kernel is handled: > > [amd-dbgapi] process_event_queue: Pulled event from dbgapi: event_id.handle = 26, event_kind = WAVE_STOP > [amd-dbgapi-lib] suspending queue_3, queue_2, queue_1 (refresh wave list) > ../../src/gdb/amd-dbgapi-target.c:1625: internal-error: amd_dbgapi_thread_deleted: Assertion `it != info->wave_info_map.end ()' failed. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > > This is the exact same problem as above, just a different > manifestation. In this scenario, we end up in update_thread_list > successfully deleting the exited thread (because it was no longer the > current thread) that was incorrectly added by infrun.c. Because it > was added by infrun.c and not by amd-dbgapi-target.c:add_gpu_thread, > it doesn't have an entry in the wave_info map, so > amd_dbgapi_thread_deleted trips on this assertion: > > gdb_assert (it != info->wave_info_map.end ()); > > here: > > ... > -> stop_all_threads > -> update_thread_list > -> target_update_thread_list > -> amd_dbgapi_target::update_thread_list > -> thread_db_target::update_thread_list > -> linux_nat_target::update_thread_list > -> delete_exited_threads > -> delete_thread > -> delete_thread_1 > -> gdb::observers::observable::notify > -> amd_dbgapi_thread_deleted > -> internal_error_loc > > The testcase thus tries both running to exit after the first kernel > exits, and running to a breakpoint in a second kernel after the first > kernel exits. > > Change-Id: I43a66f060c35aad1fe0d9ff022ce2afd0537f028 > --- > gdb/amd-dbgapi-target.c | 197 ++++++++++++++++++++++++++++++---------- > 1 file changed, 149 insertions(+), 48 deletions(-) > > diff --git a/gdb/amd-dbgapi-target.c b/gdb/amd-dbgapi-target.c > index 86102b7fb03..06f9e8c5f9c 100644 > --- a/gdb/amd-dbgapi-target.c > +++ b/gdb/amd-dbgapi-target.c > @@ -109,10 +109,9 @@ get_amd_dbgapi_target_inferior_created_observer_token () > return amd_dbgapi_target_inferior_created_observer_token; > } > > -/* A type holding coordinate, etc. info for a given wave. We cache > - this because we need this information after a wave exits. */ > +/* A type holding coordinates, etc. info for a given wave. */ > > -struct wave_info > +struct wave_coordinates > { > /* The wave. Set by the ctor. */ > amd_dbgapi_wave_id_t wave_id; > @@ -125,11 +124,44 @@ struct wave_info > uint32_t group_ids[3] {UINT32_MAX, UINT32_MAX, UINT32_MAX}; > uint32_t wave_in_group = UINT32_MAX; > > - explicit wave_info (amd_dbgapi_wave_id_t wave_id); > + explicit wave_coordinates (amd_dbgapi_wave_id_t wave_id) > + : wave_id (wave_id) > + {} > > - /* Return the target ID string for the wave this wave_info is > + /* Return the target ID string for the wave this wave_coordinates is > for. */ > std::string to_string () const; > + > + /* Pull out coordinates info from the amd-dbgapi library. */ > + void fetch (); > +}; > + > +/* A type holding info about a given wave. */ > + > +struct wave_info > +{ > + /* We cache the coordinates info because we need it after a wave > + exits. The wave's ID is here. */ > + wave_coordinates coords; > + > + /* The last resume_mode passed to amd_dbgapi_wave_resume for this > + wave. We track this because we are guaranteed to see a > + WAVE_COMMAND_TERMINATED event if a stepping wave terminates, and > + we need to know to not delete such a wave until we process that > + event. */ > + amd_dbgapi_resume_mode_t last_resume_mode = AMD_DBGAPI_RESUME_MODE_NORMAL; > + > + /* Whether we've called amd_dbgapi_wave_stop for this wave and are > + waiting for its stop event. Similarly, we track this because > + we're guaranteed to get a WAVE_COMMAND_TERMINATED event if the > + wave terminates while being stopped. */ > + bool stopping = false; > + > + explicit wave_info (amd_dbgapi_wave_id_t wave_id) > + : coords (wave_id) > + { > + coords.fetch (); > + } > }; > > /* Big enough to hold the size of the largest register in bytes. */ > @@ -275,6 +307,19 @@ static struct amd_dbgapi_target the_amd_dbgapi_target; > static const registry::key > amd_dbgapi_inferior_data; > > +/* Fetch the amd_dbgapi_inferior_info data for the given inferior. */ > + > +static struct amd_dbgapi_inferior_info * > +get_amd_dbgapi_inferior_info (struct inferior *inferior) > +{ > + amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (inferior); > + > + if (info == nullptr) > + info = amd_dbgapi_inferior_data.emplace (inferior, inferior); > + > + return info; > +} > + > /* The async event handler registered with the event loop, indicating that we > might have events to report to the core and that we'd like our wait method > to be called. > @@ -289,7 +334,7 @@ static const registry::key > static async_event_handler *amd_dbgapi_async_event_handler = nullptr; > > std::string > -wave_info::to_string () const > +wave_coordinates::to_string () const > { > std::string str = "AMDGPU Wave"; > > @@ -319,37 +364,41 @@ wave_info::to_string () const > return str; > } > > -wave_info::wave_info (amd_dbgapi_wave_id_t wave_id) > - : wave_id (wave_id) > -{ > -} > - > -/* Read in wave_info for WAVE_ID. */ > - > -static wave_info > -get_wave_info (amd_dbgapi_wave_id_t wave_id) > +void > +wave_coordinates::fetch () > { > - wave_info res (wave_id); > - > /* Any field that fails to be read is left with its in-class > initialized value, which is printed as "?". */ > > amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_AGENT, > - sizeof (res.agent_id), &res.agent_id); > + sizeof (agent_id), &agent_id); > amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_QUEUE, > - sizeof (res.queue_id), &res.queue_id); > + sizeof (queue_id), &queue_id); > amd_dbgapi_wave_get_info (wave_id, AMD_DBGAPI_WAVE_INFO_DISPATCH, > - sizeof (res.dispatch_id), &res.dispatch_id); > + sizeof (dispatch_id), &dispatch_id); > > amd_dbgapi_wave_get_info (wave_id, > AMD_DBGAPI_WAVE_INFO_WORKGROUP_COORD, > - sizeof (res.group_ids), &res.group_ids); > + sizeof (group_ids), &group_ids); > > amd_dbgapi_wave_get_info (wave_id, > AMD_DBGAPI_WAVE_INFO_WAVE_NUMBER_IN_WORKGROUP, > - sizeof (res.wave_in_group), &res.wave_in_group); > + sizeof (wave_in_group), &wave_in_group); > +} > + > +/* Get the wave_info object for TP, from the wave_info map. It is > + assumed that the wave is in the map. */ > + > +static wave_info & > +get_thread_wave_info (thread_info *tp) > +{ > + amd_dbgapi_inferior_info *info = get_amd_dbgapi_inferior_info (tp->inf); > + amd_dbgapi_wave_id_t wave_id = get_amd_dbgapi_wave_id (tp->ptid); > + > + auto it = info->wave_info_map.find (wave_id.handle); > + gdb_assert (it != info->wave_info_map.end ()); > > - return res; > + return it->second; > } > > /* Clear our async event handler. */ > @@ -370,19 +419,6 @@ async_event_handler_mark () > mark_async_event_handler (amd_dbgapi_async_event_handler); > } > > -/* Fetch the amd_dbgapi_inferior_info data for the given inferior. */ > - > -static struct amd_dbgapi_inferior_info * > -get_amd_dbgapi_inferior_info (struct inferior *inferior) > -{ > - amd_dbgapi_inferior_info *info = amd_dbgapi_inferior_data.get (inferior); > - > - if (info == nullptr) > - info = amd_dbgapi_inferior_data.emplace (inferior, inferior); > - > - return info; > -} > - > /* Set forward progress requirement to REQUIRE for all processes of PROC_TARGET > matching PTID. */ > > @@ -565,12 +601,12 @@ amd_dbgapi_target::pid_to_str (ptid_t ptid) > > auto it = info->wave_info_map.find (wave_id.handle); > if (it != info->wave_info_map.end ()) > - return it->second.to_string (); > + return it->second.coords.to_string (); > > /* A wave we don't know about. Shouldn't usually happen, but > asserting and bringing down the session is a bit too harsh. Just > print all unknown info as "?"s. */ > - return wave_info (wave_id).to_string (); > + return wave_coordinates (wave_id).to_string (); > } > > const char * > @@ -694,16 +730,24 @@ amd_dbgapi_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo) > > amd_dbgapi_wave_id_t wave_id = get_amd_dbgapi_wave_id (thread->ptid); > amd_dbgapi_status_t status; > + > + wave_info &wi = get_thread_wave_info (thread); > + amd_dbgapi_resume_mode_t &resume_mode = wi.last_resume_mode; > + amd_dbgapi_exceptions_t wave_exception; > if (thread->ptid == inferior_ptid) > - status = amd_dbgapi_wave_resume (wave_id, > - (step > - ? AMD_DBGAPI_RESUME_MODE_SINGLE_STEP > - : AMD_DBGAPI_RESUME_MODE_NORMAL), > - exception); > + { > + resume_mode = (step > + ? AMD_DBGAPI_RESUME_MODE_SINGLE_STEP > + : AMD_DBGAPI_RESUME_MODE_NORMAL); > + wave_exception = exception; > + } > else > - status = amd_dbgapi_wave_resume (wave_id, AMD_DBGAPI_RESUME_MODE_NORMAL, > - AMD_DBGAPI_EXCEPTION_NONE); > + { > + resume_mode = AMD_DBGAPI_RESUME_MODE_NORMAL; > + wave_exception = AMD_DBGAPI_EXCEPTION_NONE; > + } > > + status = amd_dbgapi_wave_resume (wave_id, resume_mode, wave_exception); > if (status != AMD_DBGAPI_STATUS_SUCCESS > /* Ignore the error that wave is no longer valid as that could > indicate that the process has exited. GDB treats resuming a > @@ -711,6 +755,8 @@ amd_dbgapi_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo) > && status != AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID) > error (_("wave_resume for wave_%ld failed (%s)"), wave_id.handle, > get_status_string (status)); > + > + wi.stopping = false; > } > } > > @@ -725,6 +771,15 @@ amd_dbgapi_target::commit_resumed () > require_forward_progress (minus_one_ptid, proc_target, true); > } > > +/* Return a string version of RESUME_MODE, for debug log purposes. */ > +static const char * > +resume_mode_to_string (amd_dbgapi_resume_mode_t resume_mode) > +{ > + return (resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP > + ? "step" > + : "normal"); Pedantically, amd_dbgapi_resume_mode_t is an enum and could in theory get more enumerators. I don't expect any, but the following construct would make the compiler complain if that ever happens: static const char * resume_mode_to_string (amd_dbgapi_resume_mode_t resume_mode) { switch (resume_mode) { case AMD_DBGAPI_RESUME_MODE_NORMAL: return "normal"; case AMD_DBGAPI_RESUME_MODE_SINGLE_STEP: return "step"; } gdb_assert_not_reached ("invalid amd_dbgapi_resume_mode_t"); } > +} > + > void > amd_dbgapi_target::stop (ptid_t ptid) > { > @@ -758,7 +813,11 @@ amd_dbgapi_target::stop (ptid_t ptid) > > status = amd_dbgapi_wave_stop (wave_id); > if (status == AMD_DBGAPI_STATUS_SUCCESS) > - return; > + { > + wave_info &wi = get_thread_wave_info (thread); > + wi.stopping = true; > + return; > + } > > if (status != AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID) > error (_("wave_stop for wave_%ld failed (%s)"), wave_id.handle, > @@ -772,6 +831,23 @@ amd_dbgapi_target::stop (ptid_t ptid) > could have terminated since the last time the wave list was > refreshed. */ > > + wave_info &wi = get_thread_wave_info (thread); > + wi.stopping = true; > + > + amd_dbgapi_debug_printf ("got AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID " > + "for wave_%ld, last_resume_mode=%s, " > + "report_thread_events=%d", > + wave_id.handle, > + resume_mode_to_string (wi.last_resume_mode), > + m_report_thread_events); > + > + /* If the wave was stepping when it terminated, then it is > + guaranteed that we will see a WAVE_COMMAND_TERMINATED event > + for it. Don't report a thread exit event or delete the > + thread yet, until we see such event. */ > + if (wi.last_resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP) > + return; > + > if (m_report_thread_events) > { > get_amd_dbgapi_inferior_info (thread->inf)->wave_events.emplace_back > @@ -1018,7 +1094,7 @@ add_gpu_thread (inferior *inf, ptid_t wave_ptid) > auto wave_id = get_amd_dbgapi_wave_id (wave_ptid); > > if (!info->wave_info_map.try_emplace (wave_id.handle, > - get_wave_info (wave_id)).second) > + wave_info (wave_id)).second) > internal_error ("wave ID %ld already in map", wave_id.handle); > > /* Create new GPU threads silently to avoid spamming the terminal > @@ -1770,7 +1846,32 @@ amd_dbgapi_target::update_thread_list () > auto it = threads.find (tp->ptid.tid ()); > > if (it == threads.end ()) > - delete_thread_silent (tp); > + { > + auto wave_id = get_amd_dbgapi_wave_id (tp->ptid); > + wave_info &wi = get_thread_wave_info (tp); > + > + /* Waves that were stepping or in progress of being > + stopped are guaranteed to report a > + WAVE_COMMAND_TERMINATED event if they terminate. > + Don't delete such threads until we see the > + event. */ > + if (wi.last_resume_mode == AMD_DBGAPI_RESUME_MODE_SINGLE_STEP > + || wi.stopping) > + { > + amd_dbgapi_debug_printf > + ("wave_%ld disappeared, keeping it" > + " (last_resume_mode=%s, stopping=%d)", > + wave_id.handle, > + resume_mode_to_string (wi.last_resume_mode), > + wi.stopping); > + } > + else > + { > + amd_dbgapi_debug_printf ("wave_%ld disappeared, deleting it", > + wave_id.handle); > + delete_thread_silent (tp); > + } > + } > else > threads.erase (it); > } > > -- > 2.43.0 > I happy with the patch as it is if you prefer to not change resume_mode_to_string. Either way Approved-By: Lancelot Six (amdgpu) Best, Lancelot.