public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp  test and native-extended-gdbserver board
@ 2023-10-02 21:15 aburgess at redhat dot com
  2023-10-02 21:15 ` [Bug gdb/30935] " aburgess at redhat dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: aburgess at redhat dot com @ 2023-10-02 21:15 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30935

            Bug ID: 30935
           Summary: Regression exposed by new
                    gdb.python/py-progspace-events.exp  test and
                    native-extended-gdbserver board
           Product: gdb
           Version: HEAD
            Status: NEW
          Severity: normal
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: aburgess at redhat dot com
  Target Milestone: ---

New test gdb.python/py-progspace-events.exp was added with commit:

  commit 59912fb2d22f8a4bb0862487f12a5cc65b6a013f
  Date:   Tue Sep 19 11:45:36 2023 +0100

      gdb: add Python events for program space addition and removal

This test fails when run with the native-extended-gdbserver board.  The failure
mode is that GDB crashes when std::terminate is called while handling an
exception.

If the same test is run using GDB 13 then GDB doesn't crash -- obviously the
events part of the test which uses the new functionality from the above commit
FAILs, but GDB doesn't crash.

All the test does is arrange to follow the parent and child after a fork, and
then allows the child process to exit, before trying to step the parent, this
doesn't feel unreasonable.

The crash can be reproduced using these commands:

  set trace-commands on
  set height 0
  set width 0
  set sysroot
  target extended-remote | ./gdbserver/gdbserver --once --multi -
/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events
  file
/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events
  set remote exec-file
/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events
  break -qualified main
  run
  break breakpt
  continue
  set detach-on-fork off
  continue
  inferior 2
  continue
  continue
  inferior 1
  step

On master (59912fb2d22) I see this fail with:

  +step
  terminate called after throwing an instance of 'gdb_exception_error'


  Fatal signal: Aborted
  ... etc ...

Simon reported this issue here:
https://inbox.sourceware.org/gdb-patches/18e22709-5bec-4175-a057-09ee92415a46@simark.ca/

The crash backtrace from his report is:

    #4  0x000055a4a0fae18d in error (fmt=0x55a49adcd020 "Cannot execute this
command while the target is running.\nUse the \"interrupt\" command to stop the
target\nand then try again.") at
/home/smarchi/src/binutils-gdb/gdbsupport/errors.cc:46
    #5  0x000055a49ec2f495 in remote_target::putpkt_binary
(this=0x61b00003f080, buf=0x62d00371e400 "qXfer:auxv:read::0,1000", cnt=23) at
/home/smarchi/src/binutils-gdb/gdb/remote.c:9740
    #6  0x000055a49ec2efc4 in remote_target::putpkt (this=0x61b00003f080,
buf=0x62d00371e400 "qXfer:auxv:read::0,1000") at
/home/smarchi/src/binutils-gdb/gdb/remote.c:9698
    #7  0x000055a49ec8b287 in remote_target::putpkt (this=0x61b00003f080,
buf=std::__debug::vector of length 36862, capacity 36862 = {...}) at
/home/smarchi/src/binutils-gdb/gdb/remote.c:1211
    #8  0x000055a49ec45c59 in remote_target::remote_read_qxfer
(this=0x61b00003f080, object_name=0x55a49adced00 "auxv", annex=0x0,
readbuf=0x62100057f900 '\276' <repeats 200 times>..., offset=0,
len=3904966623561266976, xfered_len=0x7f5f31c5f220, which_packet=18) at
/home/smarchi/src/binutils-gdb/gdb/remote.c:11316
    #9  0x000055a49ec47573 in remote_target::xfer_partial (this=0x61b00003f080,
object=TARGET_OBJECT_AUXV, annex=0x0, readbuf=0x62100057f900 '\276' <repeats
200 times>..., writebuf=0x0, offset=0, len=4096, xfered_len=0x7f5f31c5f220) at
/home/smarchi/src/binutils-gdb/gdb/remote.c:11438
    #10 0x000055a49f4595b1 in target_xfer_partial (ops=0x61b00003f080,
object=TARGET_OBJECT_AUXV, annex=0x0, readbuf=0x62100057f900 '\276' <repeats
200 times>..., writebuf=0x0, offset=0, len=4096, xfered_len=0x7f5f31c5f220) at
/home/smarchi/src/binutils-gdb/gdb/target.c:1717
    #11 0x000055a49f45aed9 in target_read_partial (ops=0x61b00003f080,
object=TARGET_OBJECT_AUXV, annex=0x0, buf=0x62100057f900 '\276' <repeats 200
times>..., offset=0, len=4096, xfered_len=0x7f5f31c5f220) at
/home/smarchi/src/binutils-gdb/gdb/target.c:1951
    #12 0x000055a49f501472 in target_read_alloc_1<unsigned char>
(ops=0x61b00003f080, object=TARGET_OBJECT_AUXV, annex=0x0) at
/home/smarchi/src/binutils-gdb/gdb/target.c:2286
    #13 0x000055a49f45c9c2 in target_read_alloc (ops=0x61b00003f080,
object=TARGET_OBJECT_AUXV, annex=0x0) at
/home/smarchi/src/binutils-gdb/gdb/target.c:2315
    #14 0x000055a49c920733 in target_read_auxv_raw (ops=0x61b00003f080) at
/home/smarchi/src/binutils-gdb/gdb/auxv.c:379
    #15 0x000055a49c920501 in target_read_auxv () at
/home/smarchi/src/binutils-gdb/gdb/auxv.c:368
    #16 0x000055a49c920c06 in target_auxv_search (match=0, valp=0x7f5f31a2fc60)
at /home/smarchi/src/binutils-gdb/gdb/auxv.c:415
    #17 0x000055a49e13e669 in linux_is_uclinux () at
/home/smarchi/src/binutils-gdb/gdb/linux-tdep.c:433
    #18 0x000055a49e13e6fa in linux_has_shared_address_space
(gdbarch=0x61c00001e080) at /home/smarchi/src/binutils-gdb/gdb/linux-tdep.c:440
    #19 0x000055a49c720d6f in gdbarch_has_shared_address_space
(gdbarch=0x61c00001e080) at /home/smarchi/src/binutils-gdb/gdb/gdbarch.c:4889
    #20 0x000055a49e7ce076 in program_space::~program_space
(this=0x61300004af00, __in_chrg=<optimized out>) at
/home/smarchi/src/binutils-gdb/gdb/progspace.c:124
    #21 0x000055a49de5d91e in delete_inferior (inf=0x61800007f080) at
/home/smarchi/src/binutils-gdb/gdb/inferior.c:290
    #22 0x000055a49de604b7 in prune_inferiors () at
/home/smarchi/src/binutils-gdb/gdb/inferior.c:480
    #23 0x000055a49debd2d9 in fetch_inferior_event () at
/home/smarchi/src/binutils-gdb/gdb/infrun.c:4558

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug gdb/30935] Regression exposed by new gdb.python/py-progspace-events.exp  test and native-extended-gdbserver board
  2023-10-02 21:15 [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp test and native-extended-gdbserver board aburgess at redhat dot com
@ 2023-10-02 21:15 ` aburgess at redhat dot com
  2023-10-02 22:58 ` aburgess at redhat dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: aburgess at redhat dot com @ 2023-10-02 21:15 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30935

Andrew Burgess <aburgess at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.1

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug gdb/30935] Regression exposed by new gdb.python/py-progspace-events.exp  test and native-extended-gdbserver board
  2023-10-02 21:15 [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp test and native-extended-gdbserver board aburgess at redhat dot com
  2023-10-02 21:15 ` [Bug gdb/30935] " aburgess at redhat dot com
@ 2023-10-02 22:58 ` aburgess at redhat dot com
  2023-10-03 13:56 ` simon.marchi at polymtl dot ca
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: aburgess at redhat dot com @ 2023-10-02 22:58 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30935

--- Comment #1 from Andrew Burgess <aburgess at redhat dot com> ---
Bisected the problem back to commit:

  commit a2827364e2bf19910fa5a54364a594a5ba3033b8
  Author: Andrew Burgess <aburgess@redhat.com>
  Date:   Fri Sep 8 15:48:16 2023 +0100

      gdb: remove final user of the executable_changed observer

which is also one of mine.  That commit changed when the auxv cache was
cleared, so my guess is we're discarding the data when we previous weren't,
which is why GDB is now trying to re-read the data, which is leading to the
crash we see.

I'll continue to dig into this and get a fix out asap.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug gdb/30935] Regression exposed by new gdb.python/py-progspace-events.exp  test and native-extended-gdbserver board
  2023-10-02 21:15 [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp test and native-extended-gdbserver board aburgess at redhat dot com
  2023-10-02 21:15 ` [Bug gdb/30935] " aburgess at redhat dot com
  2023-10-02 22:58 ` aburgess at redhat dot com
@ 2023-10-03 13:56 ` simon.marchi at polymtl dot ca
  2023-10-03 14:25 ` simon.marchi at polymtl dot ca
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: simon.marchi at polymtl dot ca @ 2023-10-03 13:56 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30935

Simon Marchi <simon.marchi at polymtl dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |simon.marchi at polymtl dot ca

--- Comment #2 from Simon Marchi <simon.marchi at polymtl dot ca> ---
Thanks, I reproduced it and made a minimal one-liner:

  ./gdb -batch -nx -q --data-directory=data-directory -ex "tar ext |
../gdbserver/gdbserver --once - a.out" -ex "set detach-on-fork off" -ex "tb
parent" -ex c -ex "inferior 2" -ex c -ex "inferior 1" -ex step

using the following program:

```
#include <unistd.h>

void parent () {}
void child () {}

int main()
{
  if (fork ()) {
      parent();
  } else {
      child();
  }
}
```

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug gdb/30935] Regression exposed by new gdb.python/py-progspace-events.exp  test and native-extended-gdbserver board
  2023-10-02 21:15 [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp test and native-extended-gdbserver board aburgess at redhat dot com
                   ` (2 preceding siblings ...)
  2023-10-03 13:56 ` simon.marchi at polymtl dot ca
@ 2023-10-03 14:25 ` simon.marchi at polymtl dot ca
  2023-10-03 16:23 ` aburgess at redhat dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: simon.marchi at polymtl dot ca @ 2023-10-03 14:25 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30935

--- Comment #3 from Simon Marchi <simon.marchi at polymtl dot ca> ---
One thing that is really fishy in ~current_program_space:

We have this to ensure the program space being deleted is not current:

  gdb_assert (this != current_program_space);

But we also have this that relies on the current inferior:

  if (!gdbarch_has_shared_address_space (target_gdbarch ()))
    delete this->aspace;

So let's say you have two inferiors, connected to two different targets.  The
current inferior is inferior 1, and you do "remove-inferiors 2".  We'll get the
has_shared_address_spaces property from inferior 1's arch, even though we want
to query about inferior 2's arch.

I spotted this because the crash happens inside this
gdbarch_has_shared_address_space.

I wonder if we can just make program_space::aspace a shared_ptr, it would
auto-delete the aspace when there are no more users, removing the complexity of
deciding when to delete it.  Also, as it stands, we never delete the aspace (I
think) if gdbarch_has_shared_address_space is true.  So imagine you connect to
a remote where gdbarch_has_shared_address_space is true, debug a few inferiors,
then disconnect.  I guess we will possibly have created an aspace and never
deleted it.  Using a shared_ptr would take care of that.  Oh, and would also
fix the crash reported in this bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug gdb/30935] Regression exposed by new gdb.python/py-progspace-events.exp  test and native-extended-gdbserver board
  2023-10-02 21:15 [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp test and native-extended-gdbserver board aburgess at redhat dot com
                   ` (3 preceding siblings ...)
  2023-10-03 14:25 ` simon.marchi at polymtl dot ca
@ 2023-10-03 16:23 ` aburgess at redhat dot com
  2023-10-05 17:23 ` cvs-commit at gcc dot gnu.org
  2023-10-05 17:24 ` simon.marchi at polymtl dot ca
  6 siblings, 0 replies; 8+ messages in thread
From: aburgess at redhat dot com @ 2023-10-03 16:23 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30935

--- Comment #4 from Andrew Burgess <aburgess at redhat dot com> ---
I also noticed the problems with address_space, and that if we reference
counted (shared_ptr) the address_space object that would have avoided this
problem.

However, I think that the commit I identified (a2827364e2bf19) does include a
bug, which should be fixed.  To this end I've posted this patch:

 
https://inbox.sourceware.org/gdb-patches/c3a8e6149dd66bbc380bc8e03cbd0110233a5abd.1696349903.git.aburgess@redhat.com/T/#u

With that patch merged then the problem we're running into here goes away.  At
least for now.

I think moving to using a ref-counted address_space would fix a bunch of
problems, but I'm reluctant to try and get that in just before we branch a
release -- also that feels like a much bigger task.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug gdb/30935] Regression exposed by new gdb.python/py-progspace-events.exp  test and native-extended-gdbserver board
  2023-10-02 21:15 [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp test and native-extended-gdbserver board aburgess at redhat dot com
                   ` (4 preceding siblings ...)
  2023-10-03 16:23 ` aburgess at redhat dot com
@ 2023-10-05 17:23 ` cvs-commit at gcc dot gnu.org
  2023-10-05 17:24 ` simon.marchi at polymtl dot ca
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-10-05 17:23 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30935

--- Comment #5 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=1b28c0f488bb84c8cc9f680afa9680dc0b0e2b3d

commit 1b28c0f488bb84c8cc9f680afa9680dc0b0e2b3d
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Tue Oct 3 17:20:14 2023 +0100

    gdb: fix auxv cache clearing from new_objfile observer

    It was pointed out on the mailing list that a recently added
    test (gdb.python/py-progspace-events.exp) was failing when run with
    the native-extended-gdbserver board.  This test was added with this
    commit:

      commit 59912fb2d22f8a4bb0862487f12a5cc65b6a013f
      Date:   Tue Sep 19 11:45:36 2023 +0100

          gdb: add Python events for program space addition and removal

    It turns out though that the test is failing due to a existing bug
    in GDB, the new test just exposes the problem.  Additionally, the
    failure really doesn't even rely on the new functionality added in the
    above commit.  I reduced the test to a simple set of steps that
    reproduced the failure and tested against GDB 13, and the test passes;
    so the bug was introduced since then.  In fact, the bug was introduced
    with this commit:

      commit a2827364e2bf19910fa5a54364a594a5ba3033b8
      Date:   Fri Sep 8 15:48:16 2023 +0100

          gdb: remove final user of the executable_changed observer

    This commit changed how the per-inferior auxv data cache is managed,
    specifically, when the cache is cleared, and it is this that leads to
    the failure.

    This bug is interesting because it exposes a number of issues with
    GDB, I'll explain all of the problems I see, though ultimately, I only
    propose fixing one problem in this commit, which is enough to resolve
    the crash we are currently seeing.

    The crash that we are seeing manifests like this:

      ...
      [Inferior 2 (process 3970384) exited normally]
      +inferior 1
      [Switching to inferior 1 [process 3970383]
(/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events)]
      [Switching to thread 1.1 (Thread 3970383.3970383)]
      #0  breakpt () at
/tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-progspace-events.c:28
      28    { /* Nothing.  */ }
      (gdb) step
      +step
      terminate called after throwing an instance of 'gdb_exception_error'

      Fatal signal: Aborted
      ... etc ...

    What's happening is that GDB attempts to refill the auxv cache as a
    result of the gdbarch_has_shared_address_space call in
    program_space::~program_space, the backtrace looks like this:

      #0  0x00007fb4f419a9a5 in raise () from /lib64/libpthread.so.0
      #1  0x00000000008b635d in handle_fatal_signal (sig=6) at
../../src/gdb/event-top.c:912
      #2  <signal handler called>
      #3  0x00007fb4f38e3625 in raise () from /lib64/libc.so.6
      #4  0x00007fb4f38cc8d9 in abort () from /lib64/libc.so.6
      #5  0x00007fb4f3c70756 in __gnu_cxx::__verbose_terminate_handler() [clone
.cold] () from /lib64/libstdc++.so.6
      #6  0x00007fb4f3c7c6dc in __cxxabiv1::__terminate(void (*)()) () from
/lib64/libstdc++.so.6
      #7  0x00007fb4f3c7b6e9 in __cxa_call_terminate () from
/lib64/libstdc++.so.6
      #8  0x00007fb4f3c7c094 in __gxx_personality_v0 () from
/lib64/libstdc++.so.6
      #9  0x00007fb4f3a80c63 in _Unwind_RaiseException_Phase2 () from
/lib64/libgcc_s.so.1
      #10 0x00007fb4f3a8154e in _Unwind_Resume () from /lib64/libgcc_s.so.1
      #11 0x0000000000e8832d in target_read_alloc_1<unsigned char>
(ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at
../../src/gdb/target.c:2266
      #12 0x0000000000e73dea in target_read_alloc (ops=0x408a3a0,
object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2315
      #13 0x000000000058248c in target_read_auxv_raw (ops=0x408a3a0) at
../../src/gdb/auxv.c:379
      #14 0x000000000058243d in target_read_auxv () at ../../src/gdb/auxv.c:368
      #15 0x000000000058255c in target_auxv_search (match=0x0,
valp=0x7ffdee17c598) at ../../src/gdb/auxv.c:415
      #16 0x0000000000a464bb in linux_is_uclinux () at
../../src/gdb/linux-tdep.c:433
      #17 0x0000000000a464f6 in linux_has_shared_address_space
(gdbarch=0x409a2d0) at ../../src/gdb/linux-tdep.c:440
      #18 0x0000000000510eae in gdbarch_has_shared_address_space
(gdbarch=0x409a2d0) at ../../src/gdb/gdbarch.c:4889
      #19 0x0000000000bc7558 in program_space::~program_space (this=0x4544aa0,
__in_chrg=<optimized out>) at ../../src/gdb/progspace.c:124
      #20 0x00000000009b245d in delete_inferior (inf=0x47b3de0) at
../../src/gdb/inferior.c:290
      #21 0x00000000009b2c10 in prune_inferiors () at
../../src/gdb/inferior.c:480
      #22 0x00000000009c5e3e in fetch_inferior_event () at
../../src/gdb/infrun.c:4558
      #23 0x000000000099b4dc in inferior_event_handler
(event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:42
      #24 0x0000000000cbc64f in remote_async_serial_handler (scb=0x4090a30,
context=0x408a6b0) at ../../src/gdb/remote.c:14859
      #25 0x0000000000d83d3a in run_async_handler_and_reschedule
(scb=0x4090a30) at ../../src/gdb/ser-base.c:138
      #26 0x0000000000d83e1f in fd_event (error=0, context=0x4090a30) at
../../src/gdb/ser-base.c:189

    So this is problem #1, if we throw an exception while deleting a
    program_space then this is not caught, and is going to crash GDB.

    Problem #2 becomes evident when we ask why GDB is throwing an error in
    this case; the error is thrown because the remote target, operating in
    non-async mode, can't read the auxv data while an inferior is running
    and GDB is waiting for a stop reply.  The problem here then, is why
    does GDB get into a position where it tries to interact with the
    remote target in this way, at this time?  The problem is caused by the
    prune_inferiors call which can be seen in the above backtrace.

    In prune_inferiors we check if the inferior is deletable, and if it
    is, we delete it.  The problem is, I think, we should also check if
    the target is currently in a state that would allow us to delete the
    inferior.  We don't currently have such a check available, we'd need
    to add one, but for the remote target, this would return false if the
    remote is in async mode and the remote is currently waiting for a stop
    reply.  With this change in place GDB would defer deleting the
    inferior until the remote target has stopped, at which point GDB would
    be able to refill the auxv cache successfully.

    And then, problem #3 becomes evident when we ask why GDB is needing to
    refill the auxv cache now when it didn't need to for GDB 13.  This is
    where the second commit mentioned above (a2827364e2bf) comes in.
    Prior to this commit, the auxv cache was cleared by the
    executable_changed observer, while after that commit the auxv cache
    was cleared by the new_objfile observer -- but only when the
    new_objfile observer is used in the special mode that actually means
    that all objfiles have been unloaded (I know, the overloading of the
    new_objfile observer is horrible, and unnecessary, but it's not really
    important for this bug).

    The difference arises because the new_objfile observer is triggered
    from clear_symtab_users, which in turn is called from
    program_space::~program_space.  The new_objfile observer for auxv does
    this:

      static void
      auxv_new_objfile_observer (struct objfile *objfile)
      {
        if (objfile == nullptr)
          invalidate_auxv_cache_inf (current_inferior ());
      }

    That is, when all the objfiles are unloaded, we clear the auxv cache
    for the current inferior.

    The problem is, then when we look at the prune_inferiors ->
    delete_inferior -> ~program_space path, we see that the current
    inferior is not going to be an inferior that exists within the
    program_space being deleted; delete_inferior removes the deleted
    inferior from the global inferior list, and then only deletes the
    program_space if program_space::empty() returns true, which is only
    the case if the current inferior isn't within the program_space to
    delete, and no other inferior exists within that program_space
    either.

    What this means is that when the new_objfile observer is called we
    can't rely on the current inferior having any relationship with the
    program space in which the objfiles were removed.  This was an error
    in the commit a2827364e2bf, the only thing we can rely on is the
    current program space.  As a result of this mistake, after commit
    a2827364e2bf, GDB was sometimes clearing the auxv cache for a random
    inferior.  In the native target case this was harmless as we can
    always refill the cache when needed, but in the remote target case, if
    we need to refill the cache when the remote target is executing, then
    we get the crash we observed.

    And additionally, if we think about this a little more, we see that
    commit a2827364e2bf made another mistake.  When all the objfiles are
    removed, they are removed from a program_space, a program_space might
    contain multiple inferiors, so surely, we should clear the auxv cache
    for all of the matching inferiors?

    Given these two insights, that the current_inferior is not relevant,
    only the current_program_space, and that we should be clearing the
    cache for all inferiors in the current_program_space, we can update
    auxv_new_objfile_observer to:

      if (objfile == nullptr)
        {
          for (inferior *inf : all_inferiors ())
            {
              if (inf->pspace == current_program_space)
                invalidate_auxv_cache_inf (inf);
            }
        }

    With this change we now correctly clear the auxv cache for the correct
    inferiors, and GDB no longer needs to refill the cache at an
    inconvenient time, this avoids the crash we were seeing.

    And finally, we reach problem #4.  Inspired by the observation that
    using the current_inferior from within the ~program_space function was
    not correct, I added some debug to see if current_inferior() was
    called anywhere else (below ~program_space), and the answer is yes,
    it's called a often.  Mostly the culprit is GDB doing:

      current_inferior ()->top_target ()-> ....

    But I think all of these calls are most likely doing the wrong thing,
    and only work because the top target in all these cases is shared
    between all inferiors, e.g. it's the native target, or the remote
    target for all inferiors.  But if we had a truly multi-connection
    setup, then we might start to see odd behaviour.

    Problem #1 I'm just ignoring for now, I guess at some point we might
    run into this again, and then we'd need to solve this.  But in this
    case I wasn't sure what a "good" solution would look like.  We need
    the auxv data in order to implement the linux_is_uclinux() function.
    If we can't get the auxv data then what should we do, assume yes, or
    assume no?  The right answer would probably be to propagate the error
    back up the stack, but then we reach ~program_space, and throwing
    exceptions from a destructor is problematic, so we'd need to catch and
    deal at this point.  The linux_is_uclinux() call is made from within
    gdbarch_has_shared_address_space(), which is used like:

      if (!gdbarch_has_shared_address_space (target_gdbarch ()))
        delete this->aspace;

    So, we would have to choose; delete the address space or not.  If we
    delete it on error, then we might delete an address space that is
    shared within another program space.  If we don't delete the address
    space, then we might leak it.  Neither choice is great.

    A better solution might be to have the address spaces be reference
    counted, then we could remove the gdbarch_has_shared_address_space
    call completely, and just rely on the reference count to auto-delete
    the address space when appropriate.

    The solution for problem #2 I already hinted at above, we should have
    a new target_can_delete_inferiors() call, which should be called from
    prune_inferiors, this would prevent GDB from trying to delete
    inferiors when a (remote) target is in a state where we know it can't
    delete the inferior.  Deleting an inferior often (always?) requires
    sending packets to the remote, and if the remote is waiting for a stop
    reply then this will never work, so the pruning should be deferred in
    this case.

    The solution for problem #3 is included in this commit.

    And, for problem #4, I'm not sure what the right solution is.  Maybe
    delete_inferior should ensure the inferior to be deleted is in place
    when ~program_space is called?  But that seems a little weird, as the
    current inferior would, in theory, still be using the current
    program_space...

    Anyway, after this commit, the gdb.python/py-progspace-events.exp test
    now passes when run with the native-extended-remote board.

    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30935
    Approved-By: Simon Marchi <simon.marchi@efficios.com>
    Change-Id: I41f0e6e2d7ecc1e5e55ec170f37acd4052f46eaf

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug gdb/30935] Regression exposed by new gdb.python/py-progspace-events.exp  test and native-extended-gdbserver board
  2023-10-02 21:15 [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp test and native-extended-gdbserver board aburgess at redhat dot com
                   ` (5 preceding siblings ...)
  2023-10-05 17:23 ` cvs-commit at gcc dot gnu.org
@ 2023-10-05 17:24 ` simon.marchi at polymtl dot ca
  6 siblings, 0 replies; 8+ messages in thread
From: simon.marchi at polymtl dot ca @ 2023-10-05 17:24 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30935

Simon Marchi <simon.marchi at polymtl dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #6 from Simon Marchi <simon.marchi at polymtl dot ca> ---
Fixed by that commit.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-05 17:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-02 21:15 [Bug gdb/30935] New: Regression exposed by new gdb.python/py-progspace-events.exp test and native-extended-gdbserver board aburgess at redhat dot com
2023-10-02 21:15 ` [Bug gdb/30935] " aburgess at redhat dot com
2023-10-02 22:58 ` aburgess at redhat dot com
2023-10-03 13:56 ` simon.marchi at polymtl dot ca
2023-10-03 14:25 ` simon.marchi at polymtl dot ca
2023-10-03 16:23 ` aburgess at redhat dot com
2023-10-05 17:23 ` cvs-commit at gcc dot gnu.org
2023-10-05 17:24 ` simon.marchi at polymtl dot ca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).