public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Pedro Alves <pedro@palves.net>
To: Lancelot SIX <lsix@lancelotsix.com>, Simon Marchi <simark@simark.ca>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH 09/11] Ensure EXIT is last event, gdb/linux
Date: Wed, 9 Mar 2022 14:45:10 +0000	[thread overview]
Message-ID: <f88f7277-3801-d07e-d340-b148ed64d18e@palves.net> (raw)
In-Reply-To: <20220309002120.bbmp67zkiiwk4hpr@Plymouth>

On 2022-03-09 00:21, Lancelot SIX wrote:
> On Mon, Mar 07, 2022 at 03:24:49PM -0500, Simon Marchi wrote:
>> On 2022-03-03 09:40, Pedro Alves wrote:
>>> This is problematic for infrun.c:stop_all_threads, which asks the
>>> target to report all thread exit events to infrun.  For example, in
>>> stop_all_threads, core GDB counts 2 threads that needs to be stopped.
>>> It then asks the target to stop those 2 threads (with
>>> target_stop(ptid)), and waits for 2 events to come back from the
>>> target.  Unfortunately, when waiting for events, the linux-nat target,
>>> due to the random event selecting mentioned above, reports the
>>> whole-process EXIT event even though the other thread has exited and
>>> its exit hasn't been reported yet.  As a consequence, stop_all_threads
>>> receives one event, but waits indefinitely for the second one to come.
>>> Effectively, GDB hangs forever.
>>
>> I don't really understand how we end up waiting forever.  Why does
>> reporting the leader exit event first make it so we discard the event
>> for the other thread?
>>
>> Other than that, it looks sensible to me to ensure we return events in
>> this order.
>>
>> Simon
> 
> Hi,
> 
> You are right, this patch is not directly linked to GDB hanging forever.
> It was written as part of a series originally started to fix a hang,
> which ended up handling more.  I think there have been mixup when
> reworking the patches before sending.

My bad, I based the commit message off of the wrong patch's original commit
message, and never questioned it.  :-P

> 
> The hang can happen because of the false positive of the zombie
> detection (which is worked around in another patch in this series).  

Right, the zombie detection patch (#7) also ensures that we only ever
report the process-exit event for the leader thread, instead of whatever
is the last lwp in the list, which is what fixes it.

> In
> this case, if GDB is waiting for stop events from 2 threads in
> stop_all_threads and the zombie detection drops a thread at that moment,
> then only the second thread reports an event, causing GDB to hang
> forever waiting for an event from the first thread.  Hanging issue left
> aside, we end up with a process terminating with the exit event + exit
> code from a non leader thread which is incorrect.

Thanks.  I double checked now that this is already described in patch #7's
commit log, so there's nothing to change in any other patch, AFAICT.

> 
> This lead us to realize that there are other situations where the
> process exit is not reported with the correct thread, and so the exit
> code returned to infrun is not the expected one.
> 
> When a multithreaded process terminates, we are supposed to first see
> the exit of each non-leader thread, and only then the exit of the leader
> thread.  This is what we pulled from the kernel using waitpid in
> linux-nat.  However linux-nat uses some "fairness" and randomly selects
> one of the events it cached to report to infrun first.  In current GDB
> (i.e.  before this series), if the random function chooses the exit from
> the leader first, linux-nat generates a THREAD_EXITED event for it, and
> will generated the EXITED event when it sees the last remaining
> (non-leader) thread exit.  The exit code associated with this EXITED
> event is the one of the non-leader thread, and this is wrong.

Right.  This is no longer a concern after patch #7, though, as linux-nat
will only ever report the EXITED event for the leader.

> 
> This patch improves the random selection and ensure that we do not
> report the leader exit as long as there are non-leader exit events
> pending (and there should be).  For any well-behaved process, this
> ensures that we report the correct exit code with the EXITED event.

Right, agreed, if applied in isolation.  It would help in that case, though if the
zombie detection kicked in, the issue would still trigger, as you're aware.  But since
patch #7 ensured we only ever report the exit for the leader, that patch fixed both
the "well-behaved" case, and the case the leader really exits.

So the only remaining effect of this patch is the ensuring a natural order
of events out of linux-nat, and we must evaluate it on those grounds alone now.

> 
> This overall helps to reason in infrun: we receive the THREAD_EXITED
> events first, and the EXITED event last.
> 
> We will rework the commit message and remove the mention of the hang in
> this patch.
> 

As we discussed yesterday internally, for stop_all_threads, it actually doesn't
matter which order the events are sent, as all events will be left pending.
Later on, infrun itself will pick one event at random between all pending
events on the infrun side (infrun.c:random_pending_event_thread), and may well pick up
the EXITED event before the THREAD_EXITED event anyhow, so to ensure proper ordering
throughout we'd need to tweak infrun as well.

There's also another point that we're not addressing yet, which is that EXECD events are
similar to EXITED events, in that they are process-wide events.  For example, if a non-leader
thread returns some event to infrun and it is left pending, and then the leader execs, and
the target reports an EXECD event for it and that exec event is also left pending, we'd want to
be sure that the non-leader event is never ever processed after the EXECD event, as that
would result in GDB thinking that the post-exec program had a second thread, when it really
does not.

I think the "badness" with exec does not happen in practice on native Linux currently, because
before the leader sees an exec event out of the kernel, it must see an exit event for the other
thread, even if it was ptrace-stopped, and so linux-nat deletes the non-leader thread before the
leader sees the execd, and as the pending events as stored in the thread, the pending event is
deleted as well everywhere automatically.

However, for remote/gdbserver, it's not so clear something odd can't happen, because gdbserver's
linux-low.cc deleting a thread does not immediately result in gdb's corresponding thread being
deleted.  I'm not sure we're guaranteed to delete the non-leader's pending event in all stages 
that queue events.  I'm thinking of gdbserver's notification queue and remote.c's queue.

So overall, I'm thinking of dropping this patch from the series for now, as it's not really
needed AFAICT, and is incomplete.

I've dropped patches #9 and #10 locally, and left the testcase (patch #11) running in a loop
testing both native and gdbserver for over an hour and saw no failures.  Any objections
to pushing the series without those two patches?

Pedro Alves

  reply	other threads:[~2022-03-09 14:45 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-03 14:40 [PATCH 00/11] Linux: Fix issues around thread group leader exits Pedro Alves
2022-03-03 14:40 ` [PATCH 01/11] Fix gdbserver/linux target_waitstatus logging assert Pedro Alves
2022-03-03 14:40 ` [PATCH 02/11] Fix gdb.threads/clone-new-thread-event.exp race Pedro Alves
2022-03-03 14:40 ` [PATCH 03/11] Fix gdb.threads/current-lwp-dead.exp race Pedro Alves
2022-03-03 14:40 ` [PATCH 04/11] gdb: Reorganize linux_nat_filter_event Pedro Alves
2022-03-03 14:40 ` [PATCH 05/11] gdbserver: Reorganize linux_process_target::filter_event Pedro Alves
2022-03-03 14:40 ` [PATCH 06/11] gdbserver: Reindent check_zombie_leaders Pedro Alves
2022-03-03 14:40 ` [PATCH 07/11] Re-add zombie leader on exit, gdb/linux Pedro Alves
2022-03-07 20:08   ` Simon Marchi
2022-03-07 20:27     ` Pedro Alves
2022-03-07 20:31       ` Simon Marchi
2022-03-09 14:37         ` Pedro Alves
2022-03-03 14:40 ` [PATCH 08/11] Re-add zombie leader on exit, gdbserver/linux Pedro Alves
2022-03-03 14:40 ` [PATCH 09/11] Ensure EXIT is last event, gdb/linux Pedro Alves
2022-03-07 20:24   ` Simon Marchi
2022-03-09  0:21     ` Lancelot SIX
2022-03-09 14:45       ` Pedro Alves [this message]
2022-03-09 22:29         ` Lancelot SIX
2022-03-10 11:46           ` Pedro Alves
2022-03-03 14:40 ` [PATCH 10/11] Ensure EXIT is last event, gdbserver/linux Pedro Alves
2022-03-03 14:40 ` [PATCH 11/11] Process exit status is leader exit status testcase Pedro Alves
2023-06-22 11:28   ` Ilya Leoshkevich
2023-06-22 13:07     ` Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f88f7277-3801-d07e-d340-b148ed64d18e@palves.net \
    --to=pedro@palves.net \
    --cc=gdb-patches@sourceware.org \
    --cc=lsix@lancelotsix.com \
    --cc=simark@simark.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).