From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by sourceware.org (Postfix) with ESMTPS id 5A15F386EC6C for ; Sat, 6 Feb 2021 18:05:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5A15F386EC6C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=alves.ped@gmail.com Received: by mail-wm1-f46.google.com with SMTP id f16so8742527wmq.5 for ; Sat, 06 Feb 2021 10:05:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=+fIpnjOdSNmlu4Us6ntUKVv4nXGwWIP5OPLLDCYGmCQ=; b=OTB3X4e2CXJNCyqGKzXvrzQj7zVF0hP1Um2veEHc6nYU31nw86lTrf6WkRHZHkhkMM MM2PDhAV+fYpM421JVoEvBNJpGlo1FrRUjF4MOvGbiTJExE/UJPPBrLSpLatrCmCtfJj F0txVkLicy0uMcveqJbOTj/Q4BY+WpZGEth1DiYCVY6zo4Rz+dozjY3go7uDgTw24Mmx orVBOa1MpwikAhzrYJddbs/52tKvJz9hQfLi9SC+kBtaFN3xvfrJPXGVlvgEuJY22pg9 u/eHyyClAh2DzW9xbTqhwKo3Z4TT4iR8f9tLI2pwut41lxiIrZqAk/6I6z6viG3J2PQX +49w== X-Gm-Message-State: AOAM531/gBJuDOqjfTYWqPkjOPNcP+7/4GvfOp+ZyQflAiauK0xZ8sZi e0iwdT2EmsgBrw/s6cNsvUzc2ic4EiF8Fw== X-Google-Smtp-Source: ABdhPJyIOc72GJOuRj8g4NlZxwxfIU1rm4nJ6X3e1KHxL6wwNvTJOHfhL9j1QulIKzT0jzUGjuHU3w== X-Received: by 2002:a05:600c:19c7:: with SMTP id u7mr8185780wmq.122.1612634744922; Sat, 06 Feb 2021 10:05:44 -0800 (PST) Received: from ?IPv6:2001:8a0:f91f:e900:1d90:d745:3c32:c159? ([2001:8a0:f91f:e900:1d90:d745:3c32:c159]) by smtp.gmail.com with ESMTPSA id z63sm12411392wme.8.2021.02.06.10.05.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 06 Feb 2021 10:05:43 -0800 (PST) Subject: Re: [PATCH v4 3/3] gdb: generalize commit_resume, avoid commit-resuming when threads have pending statuses From: Pedro Alves To: Simon Marchi , gdb-patches@sourceware.org References: <20210125045730.1739754-1-simon.marchi@polymtl.ca> <20210125045730.1739754-4-simon.marchi@polymtl.ca> Message-ID: <1653a27e-05d9-3663-e3d7-c966247ec652@palves.net> Date: Sat, 6 Feb 2021 18:05:32 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210125045730.1739754-4-simon.marchi@polymtl.ca> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Feb 2021 18:05:49 -0000 Hi Simon. This is looking good, but unfortunately, there's still one design problem to address. See below. On 25/01/21 04:57, Simon Marchi via Gdb-patches wrote: > @@ -3867,8 +3967,22 @@ fetch_inferior_event () > = make_scoped_restore (&execution_direction, > target_execution_direction ()); > > + /* Allow process stratum targets to pause their resumed threads while we > + handle the event. > + > + The optional is a bit ugly, but this is to allow calling > + maybe_call_commit_resumed_all_process_targets after destroying > + the object. */ > + gdb::optional disable_commit_resumed; > + disable_commit_resumed.emplace ("handling event"); > + > if (!do_target_wait (minus_one_ptid, ecs, TARGET_WNOHANG)) > - return; > + { > + infrun_debug_printf ("do_target_wait returned no event"); > + disable_commit_resumed.reset (); > + maybe_call_commit_resumed_all_process_targets (); > + return; > + } > > gdb_assert (ecs->ws.kind != TARGET_WAITKIND_IGNORE); > > @@ -3959,6 +4073,9 @@ fetch_inferior_event () > /* No error, don't finish the thread states yet. */ > finish_state.release (); > > + disable_commit_resumed.reset (); > + maybe_call_commit_resumed_all_process_targets (); > + > /* This scope is used to ensure that readline callbacks are > reinstalled here. */ > } Here (above) 's the problem. The issue is that as a result of handling an event, we can end re-resuming the target and synchronously waiting until that resumption is done, before we get to this disable_commit_resumed.reset() call. For example, try this: (any program with a loop will do, this is just the one I was already looking at) shell1: $ cd build/gdb $ ../gdbserver/gdbserver :9999 ./testsuite/outputs/gdb.threads/detach-step-over/detach-step-over shell2: $ export g="./gdb -data-directory=data-directory" $ $g ./testsuite/outputs/gdb.threads/detach-step-over/detach-step-over -ex "set sysroot" -ex "maint set target-non-stop on" -ex "tar remote :9999" -ex "b detach-step-over.c:54" -ex "c" -ex "set scheduler-locking on" .... Breakpoint 1 at 0x555555555227: file /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c, line 54. Continuing. .... [Switching to Thread 2456237.2456238] Thread 2 "detach-step-ove" hit Breakpoint 1, thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54 54 counter++; /* Set breakpoint here. */ (gdb) commands Type commands for breakpoint(s) 1, one per line. End with a line saying just "end". >p (void*) malloc (1) >end (gdb) c Continuing. Thread 2 "detach-step-ove" hit Breakpoint 1, thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54 54 counter++; /* Set breakpoint here. */ ^C^C^C^C^C^C help me I am stuck ^C^C So above we have: - remote target + "maint set target-non-stop on" to ensure we go via the commit-resume paths - a breakpoint with a command that does an infcall When the breakpoint triggers, GDB runs its command list, which does an infcall that never actually runs. GDB never resumes the target and then hangs waiting for a stop. In this scenario, the breakpoint commands are run here: (top-gdb) bt #0 bpstat_do_actions () at /home/pedro/gdb/binutils-gdb/src/gdb/breakpoint.c:4599 #1 0x0000557c5d698f7f in inferior_event_handler (event_type=INF_EXEC_COMPLETE) at /home/pedro/gdb/binutils-gdb/src/gdb/inf-loop.c:71 #2 0x0000557c5d6b76f6 in fetch_inferior_event () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:4051 There are are other paths that can lead to the same outcome. For example, fetch_inferior_event -> normal_stop -> execute_cmd_pre_hook runs the hook-stop, and a hook-stop may include a command that resumes the target. Also note that proceed can also call normal_stop directly, which can again lead to execute_cmd_pre_hook and to bpstat_do_actions. In all the cases that we end up resuming the target but not returning immediately, we "nest" an event loop, blocking in wait_sync_command_done. When we get to wait_sync_command_done, defer_enable_commit_resumed is still true, we never actually committed the resumes. So we wait forever for a target stop, which will never come. I think a fix could be for wait_sync_command_done to be a "synchronization" point, and have it clear defer_enable_commit_resumed and force a commit-resume. I suspect running the whole testsuite against extended-remote gdbserver with "maint set target-non-stop on" would catch these cases. > + /* If a thread from this target has some status to report, we better > + handle it before requiring the target to commit its resumed threads: > + handling the status might lead to resuming more threads. */ > + bool has_thread_with_pending_status = false; > + for (thread_info *thread : all_non_exited_threads (target)) > + if (thread->resumed && thread->suspend.waitstatus_pending_p) Typically the "we (had) better do X" idiom expresses urgency lest we face possible consequences, like "we had better finish this patch before the deadline!" But here, I don't think it's the case -- letting the target commit its resumed threads may just not be optimal. I'd suggest rewording the comment to avoid the idiom, since as is it reads to me like resuming immediately would be incorrect and have really bad consequences somehow (and then the somehow isn't explained). > + if (m_prev_defer_value == false) > + { Any reason to write that, instead of: if (!m_prev_defer_value) ? Pedro Alves