From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id E94733858C5E for ; Wed, 5 Jul 2023 10:08:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E94733858C5E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1688551738; x=1720087738; h=from:to:subject:date:message-id:references:in-reply-to: mime-version:content-transfer-encoding; bh=BjznSV6lpVaoo7CfJHCyfZ2y7NL+IQYjCszQLL8drBw=; b=F98/0In4gXjtXUHSA17xG/xUVRKdU3aRN2Exjwoh0+mPJTe1wAsdYi9H PNFL/RJSdOGqltNJTcb4eLChfmFFAh1wD/tV9jBeZWVbsTAYIZXikUE1E 2PuLonifZUJr3lOWENhTr0UcIJZdFwvdHwFzYy8LSpv8JgW6tUcKA4uez /1QiTuD4GCmjMYoGZpopw4giH58pbk1GbSEJFuZ/WckLu4ZjaqwUFC2qY 2YbcJUmn+kLNZHO1pnWcfLP0Ibdcz3OBOJNrPiADOjuZXt6CzHaQO/thE KnAl+LFgnaPme5yh1AdPBM2gKigPPyDQ6G8kicoP4mYKvI9c5+bSfSPDl w==; X-IronPort-AV: E=McAfee;i="6600,9927,10761"; a="363333950" X-IronPort-AV: E=Sophos;i="6.01,182,1684825200"; d="scan'208";a="363333950" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jul 2023 03:08:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10761"; a="713147597" X-IronPort-AV: E=Sophos;i="6.01,182,1684825200"; d="scan'208";a="713147597" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by orsmga007.jf.intel.com with ESMTP; 05 Jul 2023 03:08:54 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 5 Jul 2023 03:08:54 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27 via Frontend Transport; Wed, 5 Jul 2023 03:08:54 -0700 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (104.47.56.175) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.27; Wed, 5 Jul 2023 03:08:54 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YBqPWr72xF31RrtVujtemgTpScw2T+R+0oVxleHz/rRfCWlbT/SzVRSgb4BNjsaprZqnC+Sc2PlMzILt5tsfcOM/hwVR9aAtM4D2JAVQU1WQNsxP77KXfv0A8Ua5aB2uLnAJ/40iIrOnHQ6U2UTU1l/mvpC/+GJ4mKJYD7CWYNcCoY7ucBnRbesWp/4JGFQFE4BR/xzRgqeW0ZNG+o0d+mxFkZX+D1B/aVLhjF9/rHO1OQtVKxqp96yptWdy4iLqBeONdD3rv0wU0L4WRi1VWCFWdh4pNTNbjDYK+fAApKw/+QRDdTuYDb7vb2x1atKiyjOwyCkU1WXK6lpxC5VDlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+M8rqkoslR/BJ/4YIRM5KF1zE1KRaMBY2IGZGvtMnUk=; b=gxLmwgP9y9kIKx8PtbvgM34CwkBkmtcHWD8Y8I1cpVdkC8y3+acMrX6p1/VdWb5k79cX15F2213Vw4aQ5lrAyIci8T6i7xnrVi3+63vdD+RNNHS5EFlminiNq9TAgvWJ2NYAcaihvmSPwGi08bvfJFpzfcjUT9w5tRRQbWBIeuDssOWRkmrSUHVC7k/SM09VB67MvgjMAr7A1vQeC0kI0+rSFW6vPWuLEuiWYVcIriUEuKeqkXhPCRtph61ZJOWN0QruQPSMdDpU1h5BKFLNQ+bbj3dgrGKAVenC9giANbUQp7NrZDBGangT9EQs9y5HcFpRMwfCIaBwAQNvHgj8xA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DM4PR11MB7303.namprd11.prod.outlook.com (2603:10b6:8:108::21) by CO6PR11MB5603.namprd11.prod.outlook.com (2603:10b6:5:35c::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6565.17; Wed, 5 Jul 2023 10:08:52 +0000 Received: from DM4PR11MB7303.namprd11.prod.outlook.com ([fe80::e57e:3584:99ad:d384]) by DM4PR11MB7303.namprd11.prod.outlook.com ([fe80::e57e:3584:99ad:d384%7]) with mapi id 15.20.6565.016; Wed, 5 Jul 2023 10:08:51 +0000 From: "Aktemur, Tankut Baris" To: Andrew Burgess , "gdb-patches@sourceware.org" Subject: RE: [PATCHv2 2/8] gdb: don't restart vfork parent while waiting for child to finish Thread-Topic: [PATCHv2 2/8] gdb: don't restart vfork parent while waiting for child to finish Thread-Index: AQHZrot3cay/uDjfxU+L30PyHvc5Sa+q89dg Date: Wed, 5 Jul 2023 10:08:51 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: DM4PR11MB7303:EE_|CO6PR11MB5603:EE_ x-ms-office365-filtering-correlation-id: 4a48d17b-7deb-4a4b-f1c4-08db7d3fd50a x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: wSqDc63PNXaViOINlVSlX0h0YXn4i64+vHBfQ0K26Y3uYR2Gw+AxRd1ow0BtkvdwCFrTPXEPs47YMHfI5q7YEVJ9NDGDoVi2JOE3lb5u0pWVHxhNdIWy0BNXJ3C2XQdHDLF+DUjq265TCh4w/1hGSKsh64ivBvKz2s62GAL6FZoDi/hexkDoy39XQMN4YFOtVjNq/qrefzp95hlZYVzBYBi/+qBjMBdwKCcvCQPfJOWB4mpQx1upauhbPZxG3Fa2QSkC8Ven/WGveDEaxWdoP3QQasm7wKo5BRsQqCdjtL1+d1i4refFZN8N2IDGy/XB659tBPnDM2vE8Xq379fia+udZFP9FmYqghmvraVflsC1XUR6iUKw/46RI7brKvxpqY+F7yRmugm0V8yNjhijhKmEnb+tf/Ye4iv5x3w3sTix0lUkfyC3EVhP7b7d7lnPFf9Zofx1ZXzsjHovjKLh/9mKIBQHp6i9450p30++IzQjZaQG51k5Fgi9aN8cuZPrM4bT8ghihZU/ReXMQ4lbRdu6o2frQG53IFSLUIsEcVp5EbDbCneMdE468IaEMlX+aPih9dC5XyrTs/cYsoqaI/YINE0ksWw6hq4LIi+3Fg0= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM4PR11MB7303.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(366004)(136003)(376002)(396003)(346002)(39860400002)(451199021)(64756008)(66556008)(66476007)(66446008)(76116006)(66946007)(122000001)(38100700002)(186003)(38070700005)(86362001)(9686003)(6506007)(26005)(53546011)(82960400001)(83380400001)(478600001)(33656002)(7696005)(55016003)(110136005)(41300700001)(8936002)(8676002)(84970400001)(71200400001)(5660300002)(2906002)(316002)(52536014);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?zEoYZFGg3Sh82EUtmEhpIhuAuzeqxB+I3Buq8OQ6bLLu5ZAs3oLCVQTUQdh6?= =?us-ascii?Q?pFFZ12x0Y6iP9XozHZ5cdxYVeehBKj0hUOGkrbD6tY2S2DxYDx36wjiyw+Xg?= =?us-ascii?Q?oCy1BJip7x8XNUHdSx5R/30+nJLGJjH2s6r2xEomfmB+24H70nYcXw92M3FW?= =?us-ascii?Q?ddqBCWERWlDwdoCHFPMFmwt1YqeMcJqm3r3pvQYz9gWdoiNMeVnWy+34lrBd?= =?us-ascii?Q?8JLqCRIaNEU5K/seoRhXG68DyIiHqY2PwL9PK7b0xZpQV14gB1ouKFQhi9LZ?= =?us-ascii?Q?5y4osrT8DKOaPX2c1SCPVjO3gsr652AorZnUh3EhtUqY/7c7w4ohMtpMT3H1?= =?us-ascii?Q?r84tAYSsycmyXPdl7kB8I7VxtU5UX557H98bU5WKTZ84dX6u6Qs0xTT+SGYB?= =?us-ascii?Q?+yCSJkcni0abQFvx9ojwGqe3cjX+QNPGkhy3HFL8X5dbcF8OEICbSC3VckvE?= =?us-ascii?Q?EbNQ/Trpfj8vkaJPXlhiWP3ccJlSQzB2aVx2GF3GMypk18sktWAAcvfknIrt?= =?us-ascii?Q?wk/TGYUWchs4LVutsearL5QJHfuj+icLJnt5apfHBVz0J3ZfArN9c4kH66MU?= =?us-ascii?Q?cagR0X+DAuoYcZMQj5JOTUmdkjwgUpqQ/z2OVh3QIAjVUQEUmJg0mcGeRkmU?= =?us-ascii?Q?hMmNgWZq4bxrLA3jCkBJvBb14hsUNrnm+1riLtrYJTKoSLXvby031xVxYW6d?= =?us-ascii?Q?vqphsNxDDpHN9E6/C3QLX+bEJjwmUB70hUu9hHLCv0gWXCIwkXkK/undnFfc?= =?us-ascii?Q?Kqs799jkwwJbuZ0rYiXvqDywN6gREqvzQhOHQGqAgQh+00pk7eDY7KEBKRBj?= =?us-ascii?Q?gEY605hFMtX+zbLmq4CxJmAC9PCB96bKErMJikv0kwd2l8497cM7cIEdrl+/?= =?us-ascii?Q?m9EXIAwp5FxCGa0Ct1q6X6HyDutNToWOF/PfZwNxiUHbUclwDoBWIno8f6BY?= =?us-ascii?Q?n/iHqdCn8reZwHy/qb1FNipH/fC9O6Ftrt+GnNJDxZcL4icq8RaQEECK9Lg5?= =?us-ascii?Q?emM9FGI61CVzYLcasuxBN112jBEtiYAPOnQ9p3mvgTZXduBZClHsXvcFGH6f?= =?us-ascii?Q?VEpfcJuJ2IdnLLP0UlaD2PYQX0tymdFEVmfxLN/Sj+/tntkOjuln3HtKDfhP?= =?us-ascii?Q?0XPlobraADOAlBW/esCKhpWazC4Ye8/teS5nCIofuBCkzIoyK+L359F9nZC1?= =?us-ascii?Q?qUvXVPn96n1dQmPbG5P6h1WvBsPw+uJ0aRSQfnGwrPFUpor93O9ktngCpYdZ?= =?us-ascii?Q?E1ivY5bLc9vrdZbWplAi/h9iO/f4vUqL2awkoPRwYu1gQrY2Qvu4mHLpCauR?= =?us-ascii?Q?3SYOGwxPTGKTV+mDqwPXjHIgPFlk1ykQjn19wutGnppMmNttlXFKjaPEx2fo?= =?us-ascii?Q?YTNie4reEGDbtjP6xJvEgV146s0GtZJki1j02L+c5NKyNtT7d6r1VoFlOiuh?= =?us-ascii?Q?Ezel7mwjps+ugvzPqHoStOrMV4PleRlUtyZGvNr6LO7UrSZE5zpPoOi/RFdP?= =?us-ascii?Q?1bvOzoPoiU190VMlZ/X8NlpbGEY9HB/MB6+DM9pKQqPURGu528cEfF7Bp8OG?= =?us-ascii?Q?3SP07zyxny+CBXGDpuZg5yLY1c9Is1ObVzMiwVEv0fjhsmhY1vNp/rCu71Pd?= =?us-ascii?Q?IA=3D=3D?= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM4PR11MB7303.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4a48d17b-7deb-4a4b-f1c4-08db7d3fd50a X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jul 2023 10:08:51.3473 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: WNKD/I5WmvJSWp6QeNZOom0UDNG6oP5K8acU+UUWEgCva4eQtfJriamx8RcLm1W+wcd/HpgGt6QOVLZi61/7NNSli1n0kIROIPBJ3HFw8JU= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO6PR11MB5603 X-OriginatorOrg: intel.com Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tuesday, July 4, 2023 5:23 PM, Andrew Burgess wrote: > While working on a later patch, which changes gdb.base/foll-vfork.exp, > I noticed that sometimes I would hit this assert: > = > x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' fail= ed. > = > I eventually tracked it down to a combination of schedule-multiple > mode being on, target-non-stop being off, follow-fork-mode being set > to child, and some bad timing. The failing case is pretty simple, a > single threaded application performs a vfork, the child process then > execs some other application while the parent process (once the vfork > child has completed its exec) just exits. As best I understand > things, here's what happens when things go wrong: > = > 1. The parent process performs a vfork, GDB sees the VFORKED event > and creates an inferior and thread for the vfork child, > = > 2. GDB resumes the vfork child process. As schedule-multiple is on > and target-non-stop is off, this is translated into a request to > start all processes (see user_visible_resume_ptid), > = > 3. In the linux-nat layer we spot that one of the threads we are > about to start is a vfork parent, and so don't start that > thread (see resume_lwp), the vfork child thread is resumed, > = > 4. GDB waits for the next event, eventually entering > linux_nat_target::wait, which in turn calls linux_nat_wait_1, > = > 5. In linux_nat_wait_1 we eventually call > resume_stopped_resumed_lwps, this should restart threads that have > stopped but don't actually have anything interesting to report. > = > 6. Unfortunately, resume_stopped_resumed_lwps doesn't check for > vfork parents like resume_lwp does, so at this point the vfork > parent is resumed. This feels like the start of the bug, and this > is where I'm proposing to fix things, but, resuming the vfork parent > isn't the worst thing in the world because.... > = > 7. As the vfork child is still alive the kernel holds the vfork > parent stopped, > = > 8. Eventually the child performs its exec and GDB is sent and EXECD > event. However, because the parent is resumed, as soon as the child > performs its exec the vfork parent also sends a VFORK_DONE event to > GDB, > = > 9. Depending on timing both of these events might seem to arrive in > GDB at the same time. Normally GDB expects to see the EXECD or > EXITED/SIGNALED event from the vfork child before getting the > VFORK_DONE in the parent. We know this because it is as a result of > the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see > handle_vfork_child_exec_or_exit for details). Further the comment > in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that > when we remain attached to the child (not the parent) we should not > expect to see a VFORK_DONE, > = > 10. If both events arrive at the same time then GDB will randomly > choose one event to handle first, in some cases this will be the > VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB > expects that (a) the vfork child has finished, however, in this case > this is not completely true, the child has finished, but GDB has not > processed the event associated with the completion yet, and (b) upon > seeing a VFORK_DONE GDB assumes we are remaining attached to the > parent, and so resumes the parent process, > = > 11. GDB now handles the EXECD event. In our case we are detaching > from the parent, so GDB calls target_detach (see > handle_vfork_child_exec_or_exit), > = > 12. While this has been going on the vfork parent is executing, and > might even exit, > = > 13. In linux_nat_target::detach the first thing we do is stop all > threads in the process we're detaching from, the result of the stop > request will be cached on the lwp_info object, > = > 14. In our case the vfork parent has exited though, so when GDB > waits for the thread, instead of a stop due to signal, we instead > get a thread exited status, > = > 15. Later in the detach process we try to resume the threads just > prior to making the ptrace call to actually detach (see > detach_one_lwp), as part of the process to resume a thread we try to > touch some registers within the thread, and before doing this GDB > asserts that the thread is stopped, > = > 16. An exited thread is not classified as stopped, and so the assert > triggers! > = > So there's two bugs I see here. The first, and most critical one here > is in step #6. I think that resume_stopped_resumed_lwps should not > resume a vfork parent, just like resume_lwp doesn't resume a vfork > parent. > = > With this change in place the vfork parent will remain stopped in step > instead GDB will only see the EXECD/EXITED/SIGNALLED event. The > problems in #9 and #10 are therefore skipped and we arrive at #11, > handling the EXECD event. As the parent is still stopped #12 doesn't > apply, and in #13 when we try to stop the process we will see that it > is already stopped, there's no risk of the vfork parent exiting before > we get to this point. And finally, in #15 we are safe to poke the > process registers because it will not have exited by this point. > = > However, I did mention two bugs. > = > The second bug I've not yet managed to actually trigger, but I'm > convinced it must exist: if we forget vforks for a moment, in step #13 > above, when linux_nat_target::detach is called, we first try to stop > all threads in the process GDB is detaching from. If we imagine a > multi-threaded inferior with many threads, and GDB running in non-stop > mode, then, if the user tries to detach there is a chance that thread > could exit just as linux_nat_target::detach is entered, in which case > we should be able to trigger the same assert. > = > But, like I said, I've not (yet) managed to trigger this second bug, > and even if I could, the fix would not belong in this commit, so I'm > pointing this out just for completeness. > = > There's no test included in this commit. In a couple of commits time > I will expand gdb.base/foll-vfork.exp which is when this bug would be > exposed. Unfortunately there are at least two other bugs (separate > from the ones discussed above) that need fixing first, these will be > fixed in the next commits before the gdb.base/foll-vfork.exp test is > expanded. > = > If you do want to reproduce this failure then you will for certainly > need to run the gdb.base/foll-vfork.exp test in a loop as the failures > are all very timing sensitive. I've found that running multiple > copies in parallel makes the failure more likely to appear, I usually > run ~6 copies in parallel and expect to see a failure after within > 10mins. > --- > gdb/linux-nat.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > = > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index 383ef58fa23..7e121b7ab41 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -3346,7 +3346,14 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waits= tatus > *ourstatus, > static int > resume_stopped_resumed_lwps (struct lwp_info *lp, const ptid_t wait_ptid) > { > - if (!lp->stopped) > + struct inferior *inf =3D find_inferior_ptid (linux_target, lp->ptid); Nit: The 'struct' keyword can be omitted. -Baris Intel Deutschland GmbH Registered Address: Am Campeon 10, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva = Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928