From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id E5E8B3858D20 for ; Mon, 28 Oct 2024 17:15:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E5E8B3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E5E8B3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730135727; cv=none; b=FLFCIWVJVfWI1EMYYr1GD824iSzdcxgJE2CYFCZq0mmqCzkUeYYuKWwJ7i0yRpaL23xfvTTYPE2fwCnH2hR/rIdhyIf6WR5l5gN/ONnbLA9pzdvBY9V/JCNm2BoDwirA0A9+LfwcW+CLw7B2TX2zGAbMWn+ElE1kkNz4JH8rKVg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730135727; c=relaxed/simple; bh=d3Z8g0DObw6KDnBFPZg8krqaHq66MpjKAD3CLi4HWuI=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=oIvXUEEr8OS0BQsMDE7pNxuQu7wOoHHsU5FYVpiPUrw/OJpOMwKckWIgUWwRLCUatAAcw6Nv+6seqE8yHKzIA5A1O2a5zlxNcPLow7o1UUpuhbTSCjcNsUS7skbQCO90+1/nf+WgljdVed/3ffkH1fRqsKlBH7WwqBVT7cDcA+M= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1730135724; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wZbdMraggB0H2LKoCd1p1BKKGkl7lT02Sp0Xtj2k7yw=; b=YK4OZHvGLcEtTAm8PV2qwiH9C28TQxlAxSSpqSg7w4PFNsU18kLzjDNn/tGH2mvrWFUw5C kHxHLtYap4b6xtRuZzjYuw/qQemmlP0y6jzoUlMUUXwklvYtNUk5olpJD0aoxbrvxqLxuw IMhPZFLIU+dtXTFB18f7MvgfpEktrMM= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-646-Jxr80ugNMYqUHLUXVD5IHg-1; Mon, 28 Oct 2024 13:15:23 -0400 X-MC-Unique: Jxr80ugNMYqUHLUXVD5IHg-1 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-539e4b7c8f4so3976313e87.1 for ; Mon, 28 Oct 2024 10:15:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730135721; x=1730740521; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wZbdMraggB0H2LKoCd1p1BKKGkl7lT02Sp0Xtj2k7yw=; b=mS1SoK0JNgh9hrlmPceOq7F2rgN3bvQXMEFmGq7Rw1eQYW2343mtmpcRebe0w20lRv WCEKYwwn8m6vomxboUEWRwdNln+2gzxAbdSSGATWrzsKaoR/xZoSv2WQMNmNZSPLS4HQ eK8cv1uPueMDozqifZymeoDLfG9A9JNcFJ9A5wnqA8b3TTVTK54zN26JB/zqTkZRl5/d /DBkjX/ZfoVg3sjCh9QAxCsOJYzM+Fmx4sH9GhPXxkaeoqUYiZNF0CiXq6f/kr4Lwvb1 Q1i7ptWlb+DQ2fo0LyvIM7kZSVpYes4nhvbY1tK/fQKrUgDd7gnDYObboL3+Hg3uSSfU e44A== X-Forwarded-Encrypted: i=1; AJvYcCVp4lS2xrS4YGoxjjg7uD/XpC1H2Fleb4QK0i7kPSzjlTsjjmO+dk3SrYe5t3UmbPA49NSbWOJgWwiVTA==@sourceware.org X-Gm-Message-State: AOJu0Yybzja/buePCO6cKeVhq3zV7KQtIpoeNzFs5b+c0fpjZ10CaPsQ DcaIdrClUftSW2+ilcB2U5D0EHlM+RV9dtQUjJfxc58wtEl0L7ICObixMqfsDtkRHU5HQUhOD+6 TGRFTFpCb/WmvyQz4ghPZi7fcFsuzk0sYJQtifEACc5oPuwzmfd0YwaCxWfKVx4kBt7M= X-Received: by 2002:a05:6512:31c1:b0:530:c239:6fad with SMTP id 2adb3069b0e04-53b347234femr4328788e87.0.1730135721297; Mon, 28 Oct 2024 10:15:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGg/eMp1tCqTs5TR3ZYUMHdsLr1ysnFdTiMfcLZxSEnYDUe7hT1WIs6lF4rMX+E+7hmRY8hZA== X-Received: by 2002:a05:6512:31c1:b0:530:c239:6fad with SMTP id 2adb3069b0e04-53b347234femr4328764e87.0.1730135720756; Mon, 28 Oct 2024 10:15:20 -0700 (PDT) Received: from localhost (197.209.200.146.dyn.plus.net. [146.200.209.197]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38058b3bb57sm10047751f8f.38.2024.10.28.10.15.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2024 10:15:20 -0700 (PDT) From: Andrew Burgess To: "Metzger, Markus T" Cc: "pedro@palves.net" , "gdb-patches@sourceware.org" Subject: RE: [PATCH v2 5/6] gdb, infrun: fix silent inferior switch in do_target_wait() In-Reply-To: References: <20240411052604.87893-1-markus.t.metzger@intel.com> <20240411052604.87893-6-markus.t.metzger@intel.com> <87fropy9ri.fsf@redhat.com> <87wmhzvxy3.fsf@redhat.com> Date: Mon, 28 Oct 2024 17:15:19 +0000 Message-ID: <87ttcwup20.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Metzger, Markus T" writes: > Hello Andrew, > >>> Since the infrun do_target_wait() code obviously intends to do the wait per >>> inferior and just forgets to adjust the wait focus to one inferior, fixing that >>> seemed to be the better approach to me. >> >>I agree with you. But I do have one concern. As part of the next >>commit I think you should add: >> >> gdb_assert (ptid != minus_one_ptid); > > To trigger an issue, you would need two inferiors on the target, one of > which is replaying and the other is recording. > > That assert is too aggressive IMHO. I don't understand how you reach this conclusion, because... > > Also, the issue is not fatal. You should be able to C-c and recover by > replaying both inferiors. That's certainly not what we'd want, but that's > better than terminating the debug session with an assert. ... you seem to be saying that GDB can get into a non-responsive state, but it's OK as a user can interrupt and fix it. But that doesn't seem great. Wouldn't it be better to just fix record_btrace_target::wait so that it handles minus_one_ptid correctly? >>Which leads to my concern. With that assert in place we are, I claim, >>making the promise that target_ops::wait() will _never_ be called with >>minus_one_ptid, but this code: >> >> /* Make sure we're not widening WAIT_PTID. */ >> if (!ptid.matches (wait_ptid) >> /* Targets that cannot async will be asked for a blocking wait. >> >> Blocking wait does not work inferior-by-inferior if the target >> provides more than one inferior. Fall back to waiting for >> WAIT_PTID in that case. */ >> || !target_can_async_p () || ((options & TARGET_WNOHANG) == 0) >> /* FIXME: I don't see why we should have inferiors with zero pid, >> which indicates that the respective ptid is not a process. >> They do exist, though, and we cannot wait for them. */ >> || !ptid.is_pid ()) >> ptid = wait_ptid; >> >>>From your branch, in infrun.c, would, if I understand it correctly, mean >>that ptid could be set to minus_one_ptid in some cases. >> >>So my concern here is: are there situations where >>record_btrace_target::wait can end up being called with minus_one_ptid? > > Sure. > > The above code snippet gives a recipe how to construct such a scenario. > E.g. 'maintenance set target-async off' should do the trick. > > I tested this with native, native-gdbserver, and native-extended-gdbserver > and I have not seen fails in gdb.btrace tests, so I believe that with the in-tree > targets that support btrace it should work by default. I'm super confused, if we know that we can trigger record_btrace_target::wait to be called with minus_one_ptid, and we know there's a configuration in which record_btrace_target::wait is going to fail to find an event, then it's not really about whether any existing test fails or not, it's about whether GDB could enter that state or not. You said the minus_one_ptid assert is too aggressive, but the whole premise of this patch is that in a multi-inferior setup, if we call record_btrace_target::wait with minus_one_ptid then GDB can miss an event, right? So to me we have two choices, guarantee that record_btrace_target::wait is never call with minus_one_ptid, or make record_btrace_target::wait handle minus_one_ptid correctly. > That does not mean > that there couldn't be an out-of-tree target that would fail. Or that future > changes to GDB couldn't break this. I'm not fussed about out of tree stuff, but it sounds like there are problems with the current in-tree stuff... > > >>Lets be clear, I totally agree with you that for the case where >>TARGET_WNOHANG is set, it makes more sense that we wait on each inferior >>in turn. But I wonder if we should still be changing >>record_btrace_target::wait so that it can handle minus_one_ptid? > > Actually, I don't see how the current do_target_wait() code could work > with multiple blocking targets. > > It will prioritize a random target and then just hang, ignoring other > targets that might have events to report. You don't even need multiple > blocking targets; it suffices if one target is blocking. > > E.g. if you step one thread with scheduler-locking off (the default) and > then randomly pick a thread on a different target that happens to be > blocking. Yeah, I don't think the multi-inferior stuff will work with blocking targets. > > >>I applied this HACK: >> >> diff --git a/gdb/record-btrace.c b/gdb/record-btrace.c >> index 7937bb3f8db..367225c3320 100644 >> --- a/gdb/record-btrace.c >> +++ b/gdb/record-btrace.c >> @@ -2637,6 +2637,13 @@ record_btrace_target::wait (ptid_t ptid, struct >>target_waitstatus *status, >> >> if (moving.empty ()) >> { >> + if ((::execution_direction != EXEC_REVERSE) >> + && ptid == minus_one_ptid) >> + { >> + infrun_debug_printf ("forwarding the wait to target beneath"); >> + return this->beneath ()->wait (ptid, status, options); >> + } >> + >> *status = btrace_step_no_moving_threads (); >> >> DEBUG ("wait ended by %s: %s", null_ptid.to_string ().c_str (), >> >>and revert patch 5/6 in your series, and all of your new tests still >>pass. I don't think the above is the correct way to change ::wait, but >>it does hint that it might be possible to support minus_one_ptid. >> >>As I said, I do agree with you that do_target_wait could be improved, >>but I still need convincing that record_btrace_target::wait is OK as is. > > Without NOHANG or with a non-async target, the target beneath does > a blocking wait. This would prevent us from interleaving replaying and > recording as we would normally when waiting inferior-by-inferior with > NOHANG on async targets. > > We would very likely end up prioritizing replaying. We cannot afford to > forward the request to a potentially blocking target if there is work on > the replaying side. > > If there is no work, however, it should be safe to forward the request > and have the target beneath block or respond with NO_RESUMED. > > Let me experiment with that. OK, will look out for an update. Thanks, Andrew