From: Andrew Burgess <aburgess@redhat.com>
To: gdb-patches@sourceware.org
Cc: Simon Marchi <simark@simark.ca>
Subject: Re: [PATCH] gdb: fix auxv cache clearing from new_objfile observer
Date: Fri, 06 Oct 2023 17:43:17 +0100 [thread overview]
Message-ID: <87y1gfsnve.fsf@redhat.com> (raw)
In-Reply-To: <c3a8e6149dd66bbc380bc8e03cbd0110233a5abd.1696349903.git.aburgess@redhat.com>
Andrew Burgess <aburgess@redhat.com> writes:
> It was pointed out on the mailing list that a recently added
> test (gdb.python/py-progspace-events.exp) was failing when run with
> the native-extended-gdbserver board. This test was added with this
> commit:
>
> commit 59912fb2d22f8a4bb0862487f12a5cc65b6a013f
> Date: Tue Sep 19 11:45:36 2023 +0100
>
> gdb: add Python events for program space addition and removal
>
> It turns out though that the test is failing due to a existing bug
> in GDB, the new test just exposes the problem. Additionally, the
> failure really doesn't even rely on the new functionality added in the
> above commit. I reduced the test to a simple set of steps that
> reproduced the failure and tested against GDB 13, and the test passes;
> so the bug was introduced since then. In fact, the bug was introduced
> with this commit:
>
> commit a2827364e2bf19910fa5a54364a594a5ba3033b8
> Date: Fri Sep 8 15:48:16 2023 +0100
>
> gdb: remove final user of the executable_changed observer
>
> This commit changed how the per-inferior auxv data cache is managed,
> specifically, when the cache is cleared, and it is this that leads to
> the failure.
>
> This bug is interesting because it exposes a number of issues with
> GDB, I'll explain all of the problems I see, though ultimately, I only
> propose fixing one problem in this commit, which is enough to resolve
> the crash we are currently seeing.
>
> The crash that we are seeing manifests like this:
>
> ...
> [Inferior 2 (process 3970384) exited normally]
> +inferior 1
> [Switching to inferior 1 [process 3970383] (/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events)]
> [Switching to thread 1.1 (Thread 3970383.3970383)]
> #0 breakpt () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-progspace-events.c:28
> 28 { /* Nothing. */ }
> (gdb) step
> +step
> terminate called after throwing an instance of 'gdb_exception_error'
>
> Fatal signal: Aborted
> ... etc ...
>
> What's happening is that GDB attempts to refile the auxv cache as a
> result of the gdbarch_has_shared_address_space call in
> program_space::~program_space, the backtrace looks like this:
>
> #0 0x00007fb4f419a9a5 in raise () from /lib64/libpthread.so.0
> #1 0x00000000008b635d in handle_fatal_signal (sig=6) at ../../src/gdb/event-top.c:912
> #2 <signal handler called>
> #3 0x00007fb4f38e3625 in raise () from /lib64/libc.so.6
> #4 0x00007fb4f38cc8d9 in abort () from /lib64/libc.so.6
> #5 0x00007fb4f3c70756 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
> #6 0x00007fb4f3c7c6dc in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
> #7 0x00007fb4f3c7b6e9 in __cxa_call_terminate () from /lib64/libstdc++.so.6
> #8 0x00007fb4f3c7c094 in __gxx_personality_v0 () from /lib64/libstdc++.so.6
> #9 0x00007fb4f3a80c63 in _Unwind_RaiseException_Phase2 () from /lib64/libgcc_s.so.1
> #10 0x00007fb4f3a8154e in _Unwind_Resume () from /lib64/libgcc_s.so.1
> #11 0x0000000000e8832d in target_read_alloc_1<unsigned char> (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2266
> #12 0x0000000000e73dea in target_read_alloc (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2315
> #13 0x000000000058248c in target_read_auxv_raw (ops=0x408a3a0) at ../../src/gdb/auxv.c:379
> #14 0x000000000058243d in target_read_auxv () at ../../src/gdb/auxv.c:368
> #15 0x000000000058255c in target_auxv_search (match=0x0, valp=0x7ffdee17c598) at ../../src/gdb/auxv.c:415
> #16 0x0000000000a464bb in linux_is_uclinux () at ../../src/gdb/linux-tdep.c:433
> #17 0x0000000000a464f6 in linux_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/linux-tdep.c:440
> #18 0x0000000000510eae in gdbarch_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/gdbarch.c:4889
> #19 0x0000000000bc7558 in program_space::~program_space (this=0x4544aa0, __in_chrg=<optimized out>) at ../../src/gdb/progspace.c:124
> #20 0x00000000009b245d in delete_inferior (inf=0x47b3de0) at ../../src/gdb/inferior.c:290
> #21 0x00000000009b2c10 in prune_inferiors () at ../../src/gdb/inferior.c:480
> #22 0x00000000009c5e3e in fetch_inferior_event () at ../../src/gdb/infrun.c:4558
> #23 0x000000000099b4dc in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:42
> #24 0x0000000000cbc64f in remote_async_serial_handler (scb=0x4090a30, context=0x408a6b0) at ../../src/gdb/remote.c:14859
> #25 0x0000000000d83d3a in run_async_handler_and_reschedule (scb=0x4090a30) at ../../src/gdb/ser-base.c:138
> #26 0x0000000000d83e1f in fd_event (error=0, context=0x4090a30) at ../../src/gdb/ser-base.c:189
>
> So this is problem #1, if we throw an exception while deleting a
> program_space then this is not caught, and is going to crash GDB.
>
> Problem #2 becomes evident when we ask why GDB is throwing an error in
> this case; the error is thrown because the remote target, operating in
> non-async mode, can't read the auxv data while an inferior is running
> and GDB is waiting for a stop reply. The problem here then, is why
> does GDB get into a position where it tries to interact with the
> remote target in this way, at this time? The problem is caused by the
> prune_inferiors call which can be seen in the above backtrace.
>
> In prune_inferiors we check if the inferior is deletable, and if it
> is, we delete it. The problem is, I think, we should also check if
> the target is currently in a state that would allow us to delete the
> inferior. We don't currently have such a check available, we'd need
> to add one, but for the remote target, this would return false if the
> remote is in async mode and the remote is currently waiting for a stop
> reply. With this change in place GDB would defer deleting the
> inferior until the remote target has stopped, at which point GDB would
> be able to refill the auxv cache successfully.
>
> And then, problem #3 becomes evident when we ask why GDB is needing to
> refill the auxv cache now when it didn't need to for GDB 13. This is
> where the second commit mentioned above (a2827364e2bf) comes in.
> Prior to this commit, the auxv cache was cleared by the
> executable_changed observer, while after that commit the auxv cache
> was cleared by the new_objfile observer -- but only when the
> new_objfile observer is used in the special mode that actually means
> that all objfiles have been unloaded (I know, the overloading of the
> new_objfile observer is horrible, and unnecessary, but it's not really
> important for this bug).
>
> The difference arises because the new_objfile observer is triggered
> from clear_symtab_users, which in turn is called from
> program_space::~program_space. The new_objfile observer for auxv does
> this:
>
> static void
> auxv_new_objfile_observer (struct objfile *objfile)
> {
> if (objfile == nullptr)
> invalidate_auxv_cache_inf (current_inferior ());
> }
>
> That is, when all the objfiles are unloaded, we clear the auxv cache
> for the current inferior.
>
> The problem is, then when we look at the prune_inferiors ->
> delete_inferior -> ~program_space path, we see that the current
> inferior is not going to be an inferior that exists within the
> program_space being deleted; delete_inferior removes the deleted
> inferior from the global inferior list, and then only deletes the
> program_space if program_space::empty() returns true, which is only
> the case if the current inferior isn't within the program_space to
> delete, and no other inferior exists within that program_space
> either.
>
> What this means is that when the new_objfile observer is called we
> can't rely on the current inferior having any relationship with the
> program space in which the objfiles were removed. This was an error
> in the commit a2827364e2bf, the only thing we can rely on is the
> current program space. As a result of this mistake, after commit
> a2827364e2bf, GDB was sometimes clearing the auxv cache for a random
> inferior. In the native target case this was harmless as we can
> always refill the cache when needed, but in the remote target case, if
> we need to refill the cache when the remote target is executing, then
> we get the crash we observed.
>
> And additionally, if we think about this a little more, we see that
> commit a2827364e2bf made another mistake. When all the objfiles are
> removed, they are removed from a program_space, a program_space might
> contain multiple inferiors, so surely, we should clear the auxv cache
> for all of the matching inferiors?
>
> Given these two insights, that the current_inferior is not relevant,
> only the current_program_space, and that we should be clearing the
> cache for all inferiors in the current_program_space, we can update
> auxv_new_objfile_observer to:
>
> if (objfile == nullptr)
> {
> for (inferior *inf : all_inferiors ())
> {
> if (inf->pspace == current_program_space)
> invalidate_auxv_cache_inf (inf);
> }
> }
>
> With this change we now correctly clear the auxv cache for the correct
> inferiors, and GDB no longer needs to refill the cache at an
> inconvenient time, this avoids the crash we were seeing.
>
> And finally, we reach problem #4. Inspired by the observation that
> using the current_inferior from within the ~program_space function was
> not correct, I added some debug to see if current_inferior() was
> called anywhere else (below ~program_space), and the answer is yes,
> it's called a often. Mostly the culprit is GDB doing:
>
> current_inferior ()->top_target ()-> ....
>
> But I think all of these calls are most likely doing the wrong thing,
> and only work because the top target in all these cases is shared
> between all inferiors, e.g. it's the native target, or the remote
> target for all inferiors. But if we had a truly multi-connection
> setup, then we might start to see odd behaviour.
>
> Problem #1 I'm just ignoring for now, I guess at some point we might
> run into this again, and then we'd need to solve this. But in this
> case I wasn't sure what a "good" solution would look like. We need
> the auxv data in order to implement the linux_is_uclinux() function.
> If we can't get the auxv data then what should we do, assume yes, or
> assume no? The right answer would probably be to propagate the error
> back up the stack, but then we reach ~program_space, and throwing
> exceptions from a destructor is problematic, so we'd need to catch and
> deal at this point. The linux_is_uclinux() call is made from within
> gdbarch_has_shared_address_space(), which is used like:
>
> if (!gdbarch_has_shared_address_space (target_gdbarch ()))
> delete this->aspace;
>
> So, we would have to choose; delete the address space or not. If we
> delete it on error, then we might delete an address space that is
> shared within another program space. If we don't delete the address
> space, then we might leak it. Neither choice is great.
>
> A better solution might be to have the address spaces be reference
> counted, then we could remove the gdbarch_has_shared_address_space
> call completely, and just rely on the reference count to auto-delete
> the address space when appropriate.
>
> The solution for problem #2 I already hinted at above, we should have
> a new target_can_delete_inferiors() call, which should be called from
> prune_inferiors, this would prevent GDB from trying to delete
> inferiors when a (remote) target is in a state where we know it can't
> delete the inferior. Deleting an inferior often (always?) requires
> sending packets to the remote, and if the remote is waiting for a stop
> reply then this will never work, so the pruning should be deferred in
> this case.
>
> The solution for problem #3 is included in this commit.
>
> And, for problem #4, I'm not sure what the right solution is. Maybe
> delete_inferior should ensure the inferior to be deleted is in place
> when ~program_space is called? But that seems a little weird, as the
> current inferior would, in theory, still be using the current
> program_space...
>
> Anyway, after this commit, the gdb.python/py-progspace-events.exp test
> now passes when run with the native-extended-remote board.
>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30935
This commit was merged (1b28c0f488bb).
Thanks,
Andrew
> ---
> gdb/auxv.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gdb/auxv.c b/gdb/auxv.c
> index 9f599b04a4f..cfb44ffa0bb 100644
> --- a/gdb/auxv.c
> +++ b/gdb/auxv.c
> @@ -351,7 +351,16 @@ static void
> auxv_new_objfile_observer (struct objfile *objfile)
> {
> if (objfile == nullptr)
> - invalidate_auxv_cache_inf (current_inferior ());
> + {
> + /* When OBJFILE is nullptr, this indicates that all symbol files have
> + been unloaded from the current program space. Discard cached auxv
> + information from any inferior within the effected program space. */
> + for (inferior *inf : all_inferiors ())
> + {
> + if (inf->pspace == current_program_space)
> + invalidate_auxv_cache_inf (inf);
> + }
> + }
> }
>
> /* See auxv.h. */
>
> base-commit: e030ce2c79feee6a35665a6580a23dcf937ea46f
> --
> 2.25.4
prev parent reply other threads:[~2023-10-06 16:43 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-03 16:20 Andrew Burgess
2023-10-03 17:50 ` Simon Marchi
2023-10-05 17:20 ` Simon Marchi
2023-11-06 8:21 ` Tom de Vries
2023-10-06 16:43 ` Andrew Burgess [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1gfsnve.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=simark@simark.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).