public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/2] [gdb] Fix segfault in for_each_block
@ 2023-11-04 15:57 Tom de Vries
  2023-11-04 15:57 ` [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1 Tom de Vries
  2023-11-04 15:57 ` [PATCH 2/2] [gdb] Fix segfault in for_each_block, part 2 Tom de Vries
  0 siblings, 2 replies; 8+ messages in thread
From: Tom de Vries @ 2023-11-04 15:57 UTC (permalink / raw)
  To: gdb-patches

This patch series fixes PR gdb/30547, a segfault when running test-case
gdb.base/vfork-follow-parent.exp on powerpc64 (likewise on s390x).

There are two patches, each of them by themselves sufficient to no longer
trigger the segfault.

The root cause of the problem is that linux_is_uclinux, and consequently
gdbarch_has_shared_address_space returns an incorrect value.

The first patch makes gdb more robust against gdbarch_has_shared_address_space
returning incorrect values, by eliminating a call to it.

The second patch addresses the root cause.

Tested on top of trunk on x86_64-linux and ppc64le-linux.
Tested on top of gdb-14-branch on ppc64-linux.

[ I used gdb-14-branch for ppc64-linux, because I can't build trunk anymore
with system gcc 4.8.5 (CentOS-7), due to the recent c++17 requirement (and
just before that, some gcc bug in atomic support), and that's all I have
readily available on that machine. ]

There is still scope to fix things further.

When I started to investigate, I noticed that I only ran into the segfault on
ppc64 and s390x, two big-endian architectures, so I sort of expected to find an
endian-related problem.

Instead, the problem was ppc_linux_target_wordsize returning 4 instead of 8,
which causes gdb to interpret the 8-byte entry auxv vector using 4-byte
words, causing an incorrect linux_is_uclinux == true.

The same problem happens on ppc64le (ppc_linux_target_wordsize returns 4),
it's just that the incorrect word size doesn't change the outcome of:
- target_auxv_search (AT_NULL, &dummy) == 1, and
- target_auxv_search (AT_PAGESZ, &dummy) == 1
so linux_is_uclinux returns false, as it should.

This suggest a too forgiving parsing of the auxv vector, which should be made
more strict.

Finally, it should be fixed that ppc_linux_target_wordsize returns 4 in a
process with wordsize == 8.

I added an assert that PTRACE_PEEKUSER doesn't fail (errno != 0) and ran into
it in test-case gdb.base/access-mem-running.exp, during trying to "set a
breakpoint while the process is running".  It's clear that it's quite common
for this to happen, and it's surprising that this doesn't cause more problems.

I'll eventually file PRs for these two issues, for now my interest is to
backport at least one, possibly both patches from this series to fix this PR
on the gdb 14 release branch (and the 13.2 based distro packages I maintain).

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547

Tom de Vries (2):
  [gdb] Fix segfault in for_each_block, part 1
  [gdb] Fix segfault in for_each_block, part 2

 gdb/infrun.c    | 12 +++++++++++-
 gdb/progspace.c | 37 +++++++++++++++++++++++++++----------
 gdb/progspace.h | 11 ++++++++++-
 3 files changed, 48 insertions(+), 12 deletions(-)


base-commit: de2efa143e3652d69c278dd1eb10a856593917c0
-- 
2.35.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1
  2023-11-04 15:57 [PATCH 0/2] [gdb] Fix segfault in for_each_block Tom de Vries
@ 2023-11-04 15:57 ` Tom de Vries
  2023-11-06 15:24   ` Andrew Burgess
  2023-11-06 17:05   ` Simon Marchi
  2023-11-04 15:57 ` [PATCH 2/2] [gdb] Fix segfault in for_each_block, part 2 Tom de Vries
  1 sibling, 2 replies; 8+ messages in thread
From: Tom de Vries @ 2023-11-04 15:57 UTC (permalink / raw)
  To: gdb-patches

When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise
on s390x), I run into:
...
(gdb) PASS: gdb.base/vfork-follow-parent.exp: \
  exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \
  resolution_method=schedule-multiple: print unblock_parent = 1
continue^M
Continuing.^M
Reading symbols from vfork-follow-parent-exit...^M
^M
^M
Fatal signal: Segmentation fault^M
----- Backtrace -----^M
0x1027d3e7 gdb_internal_backtrace_1^M
        src/gdb/bt-utils.c:122^M
0x1027d54f _Z22gdb_internal_backtracev^M
        src/gdb/bt-utils.c:168^M
0x1057643f handle_fatal_signal^M
        src/gdb/event-top.c:889^M
0x10576677 handle_sigsegv^M
        src/gdb/event-top.c:962^M
0x3fffa7610477 ???^M
0x103f2144 for_each_block^M
        src/gdb/dcache.c:199^M
0x103f235b _Z17dcache_invalidateP13dcache_struct^M
        src/gdb/dcache.c:251^M
0x10bde8c7 _Z24target_dcache_invalidatev^M
        src/gdb/target-dcache.c:50^M
...
or similar.

The root cause for the segmentation fault is that linux_is_uclinux gives an
incorrect result: it should always return false, given that we're running on a
regular linux system, but instead it returns first true, then false.

In more detail, the segmentation fault happens as follows:
- a program space with an address space is created
- a second program space is about to be created. maybe_new_address_space
  is called, and because linux_is_uclinux returns true, maybe_new_address_space
  returns false, and no new address space is created
- a second program space with the same address space is created
- a program space is deleted. Because linux_is_uclinux now returns false,
  gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns
  false, and the address space is deleted
- when gdb uses the address space of the remaining program space, we run into
  the segfault, because the address space is deleted.

Hardcoding linux_is_uclinux to false makes the test-case pass.

We leave addressing the root cause for the following commit in this series.

For now, prevent the segmentation fault by making the address space a refcounted
object.

This was already suggested here [1]:
...
A better solution might be to have the address spaces be reference counted
...

Tested on top of trunk on x86_64-linux and ppc64le-linux.
Tested on top of gdb-14-branch on ppc64-linux.

PR gdb/30547
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547

[1] https://sourceware.org/pipermail/gdb-patches/2023-October/202928.html
---
 gdb/progspace.c | 37 +++++++++++++++++++++++++++----------
 gdb/progspace.h | 11 ++++++++++-
 2 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/gdb/progspace.c b/gdb/progspace.c
index 839707e9d71..4fea21f0ca1 100644
--- a/gdb/progspace.c
+++ b/gdb/progspace.c
@@ -96,9 +96,9 @@ remove_program_space (program_space *pspace)
 /* See progspace.h.  */
 
 program_space::program_space (address_space *aspace_)
-  : num (++last_program_space_num),
-    aspace (aspace_)
+  : num (++last_program_space_num)
 {
+  set_aspace (aspace_);
   program_spaces.push_back (this);
   gdb::observers::new_program_space.notify (this);
 }
@@ -122,8 +122,7 @@ program_space::~program_space ()
   /* Defer breakpoint re-set because we don't want to create new
      locations for this pspace which we're tearing down.  */
   clear_symtab_users (SYMFILE_DEFER_BP_RESET);
-  if (!gdbarch_has_shared_address_space (current_inferior ()->arch ()))
-    delete this->aspace;
+  reset_aspace ();
 }
 
 /* See progspace.h.  */
@@ -409,20 +408,19 @@ update_address_spaces (void)
 
   init_address_spaces ();
 
+  for (struct program_space *pspace : program_spaces)
+    pspace->reset_aspace ();
+
   if (shared_aspace)
     {
       struct address_space *aspace = new address_space ();
 
-      delete current_program_space->aspace;
       for (struct program_space *pspace : program_spaces)
-	pspace->aspace = aspace;
+	pspace->set_aspace (aspace);
     }
   else
     for (struct program_space *pspace : program_spaces)
-      {
-	delete pspace->aspace;
-	pspace->aspace = new address_space ();
-      }
+      pspace->set_aspace (new address_space ());
 
   for (inferior *inf : all_inferiors ())
     if (gdbarch_has_global_solist (current_inferior ()->arch ()))
@@ -433,8 +431,27 @@ update_address_spaces (void)
 
 \f
 
+void
+program_space::set_aspace (struct address_space *aspace_)
+{
+  aspace = aspace_;
+
+  aspace->incref ();
+}
+
 /* See progspace.h.  */
 
+void
+program_space::reset_aspace ()
+{
+  aspace->decref ();
+
+  if (aspace->refcount () == 0)
+    delete aspace;
+
+  aspace = nullptr;
+}
+
 void
 program_space::clear_solib_cache ()
 {
diff --git a/gdb/progspace.h b/gdb/progspace.h
index a22e427400e..065ca38e255 100644
--- a/gdb/progspace.h
+++ b/gdb/progspace.h
@@ -336,6 +336,10 @@ struct program_space
      make breakpoints global).  */
   struct address_space *aspace = NULL;
 
+  void set_aspace (struct address_space *aspace);
+
+  void reset_aspace ();
+
   /* True if this program space's section offsets don't yet represent
      the final offsets of the "live" address space (that is, the
      section addresses still require the relocation offsets to be
@@ -384,12 +388,17 @@ struct program_space
 /* An address space.  It is used for comparing if
    pspaces/inferior/threads see the same address space and for
    associating caches to each address space.  */
-struct address_space
+struct address_space : public refcounted_object
 {
   /* Create a new address space object, and add it to the list.  */
   address_space ();
   DISABLE_COPY_AND_ASSIGN (address_space);
 
+  ~address_space ()
+  {
+    gdb_assert (refcount () == 0);
+  }
+
   /* Returns the integer address space id of this address space.  */
   int num () const
   {
-- 
2.35.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] [gdb] Fix segfault in for_each_block, part 2
  2023-11-04 15:57 [PATCH 0/2] [gdb] Fix segfault in for_each_block Tom de Vries
  2023-11-04 15:57 ` [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1 Tom de Vries
@ 2023-11-04 15:57 ` Tom de Vries
  1 sibling, 0 replies; 8+ messages in thread
From: Tom de Vries @ 2023-11-04 15:57 UTC (permalink / raw)
  To: gdb-patches

The previous commit describes PR gdb/30547, a segfault when running test-case
gdb.base/vfork-follow-parent.exp on powerpc64 (likewise on s390x).

The root cause for the segmentation fault is that linux_is_uclinux gives an
incorrect result: it returns true instead of false.

So, why does linux_is_uclinux:
...
int
linux_is_uclinux (void)
{
  CORE_ADDR dummy;

  return (target_auxv_search (AT_NULL, &dummy) > 0
	  && target_auxv_search (AT_PAGESZ, &dummy) == 0);
...
return true?

This is because ppc_linux_target_wordsize returns 4 instead of 8, causing
ppc_linux_nat_target::auxv_parse to misinterpret the auxv vector.

So, why does ppc_linux_target_wordsize:
...
int
ppc_linux_target_wordsize (int tid)
{
  int wordsize = 4;

  /* Check for 64-bit inferior process.	 This is the case when the host is
     64-bit, and in addition the top bit of the MSR register is set.  */
  long msr;

  errno = 0;
  msr = (long) ptrace (PTRACE_PEEKUSER, tid, PT_MSR * 8, 0);
  if (errno == 0 && ppc64_64bit_inferior_p (msr))
    wordsize = 8;

  return wordsize;
}
...
return 4?

Specifically, we get this result because because tid == 0, so we get
errno == ESRCH.

The tid == 0 is caused by the switch_to_no_thread in
handle_vfork_child_exec_or_exit:
...
	  /* Switch to no-thread while running clone_program_space, so
	     that clone_program_space doesn't want to read the
	     selected frame of a dead process.  */
	  scoped_restore_current_thread restore_thread;
	  switch_to_no_thread ();

	  inf->pspace = new program_space (maybe_new_address_space ());
...
but moving the maybe_new_address_space call to before that gives us the
same result.  The tid is no longer 0, but we still get ESRCH because the
thread has exited.

Fix this in handle_vfork_child_exec_or_exit by doing the
maybe_new_address_space call in the context of the vfork parent.

Tested on top of trunk on x86_64-linux and ppc64le-linux.
Tested on top of gdb-14-branch on ppc64-linux.

PR gdb/30547
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547
---
 gdb/infrun.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 4fde96800fb..834718726a4 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1105,13 +1105,23 @@ handle_vfork_child_exec_or_exit (int exec)
 	     go ahead and create a new one for this exiting
 	     inferior.  */
 
+	  struct address_space *aspace;
+	  {
+	    /* Temporarily switch to the vfork parent, to facilitate ptrace
+	       calls done during maybe_new_address_space.  */
+	    scoped_restore save_inferior_ptid
+	      = make_scoped_restore (&inferior_ptid);
+	    inferior_ptid = ptid_t (vfork_parent->pid);
+	    aspace = maybe_new_address_space ();
+	  }
+
 	  /* Switch to no-thread while running clone_program_space, so
 	     that clone_program_space doesn't want to read the
 	     selected frame of a dead process.  */
 	  scoped_restore_current_thread restore_thread;
 	  switch_to_no_thread ();
 
-	  inf->pspace = new program_space (maybe_new_address_space ());
+	  inf->pspace = new program_space (aspace);
 	  inf->aspace = inf->pspace->aspace;
 	  set_current_program_space (inf->pspace);
 	  inf->removable = true;
-- 
2.35.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1
  2023-11-04 15:57 ` [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1 Tom de Vries
@ 2023-11-06 15:24   ` Andrew Burgess
  2023-11-07 13:28     ` Tom de Vries
  2023-11-06 17:05   ` Simon Marchi
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Burgess @ 2023-11-06 15:24 UTC (permalink / raw)
  To: Tom de Vries, gdb-patches

Tom de Vries <tdevries@suse.de> writes:

> When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise
> on s390x), I run into:
> ...
> (gdb) PASS: gdb.base/vfork-follow-parent.exp: \
>   exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \
>   resolution_method=schedule-multiple: print unblock_parent = 1
> continue^M
> Continuing.^M
> Reading symbols from vfork-follow-parent-exit...^M
> ^M
> ^M
> Fatal signal: Segmentation fault^M
> ----- Backtrace -----^M
> 0x1027d3e7 gdb_internal_backtrace_1^M
>         src/gdb/bt-utils.c:122^M
> 0x1027d54f _Z22gdb_internal_backtracev^M
>         src/gdb/bt-utils.c:168^M
> 0x1057643f handle_fatal_signal^M
>         src/gdb/event-top.c:889^M
> 0x10576677 handle_sigsegv^M
>         src/gdb/event-top.c:962^M
> 0x3fffa7610477 ???^M
> 0x103f2144 for_each_block^M
>         src/gdb/dcache.c:199^M
> 0x103f235b _Z17dcache_invalidateP13dcache_struct^M
>         src/gdb/dcache.c:251^M
> 0x10bde8c7 _Z24target_dcache_invalidatev^M
>         src/gdb/target-dcache.c:50^M
> ...
> or similar.
>
> The root cause for the segmentation fault is that linux_is_uclinux gives an
> incorrect result: it should always return false, given that we're running on a
> regular linux system, but instead it returns first true, then false.
>
> In more detail, the segmentation fault happens as follows:
> - a program space with an address space is created
> - a second program space is about to be created. maybe_new_address_space
>   is called, and because linux_is_uclinux returns true, maybe_new_address_space
>   returns false, and no new address space is created
> - a second program space with the same address space is created
> - a program space is deleted. Because linux_is_uclinux now returns false,
>   gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns
>   false, and the address space is deleted
> - when gdb uses the address space of the remaining program space, we run into
>   the segfault, because the address space is deleted.
>
> Hardcoding linux_is_uclinux to false makes the test-case pass.
>
> We leave addressing the root cause for the following commit in this series.
>
> For now, prevent the segmentation fault by making the address space a refcounted
> object.
>
> This was already suggested here [1]:
> ...
> A better solution might be to have the address spaces be reference counted
> ...
>
> Tested on top of trunk on x86_64-linux and ppc64le-linux.
> Tested on top of gdb-14-branch on ppc64-linux.
>
> PR gdb/30547
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547
>
> [1] https://sourceware.org/pipermail/gdb-patches/2023-October/202928.html
> ---
>  gdb/progspace.c | 37 +++++++++++++++++++++++++++----------
>  gdb/progspace.h | 11 ++++++++++-
>  2 files changed, 37 insertions(+), 11 deletions(-)
>
> diff --git a/gdb/progspace.c b/gdb/progspace.c
> index 839707e9d71..4fea21f0ca1 100644
> --- a/gdb/progspace.c
> +++ b/gdb/progspace.c
> @@ -96,9 +96,9 @@ remove_program_space (program_space *pspace)
>  /* See progspace.h.  */
>  
>  program_space::program_space (address_space *aspace_)
> -  : num (++last_program_space_num),
> -    aspace (aspace_)
> +  : num (++last_program_space_num)
>  {
> +  set_aspace (aspace_);
>    program_spaces.push_back (this);
>    gdb::observers::new_program_space.notify (this);
>  }
> @@ -122,8 +122,7 @@ program_space::~program_space ()
>    /* Defer breakpoint re-set because we don't want to create new
>       locations for this pspace which we're tearing down.  */
>    clear_symtab_users (SYMFILE_DEFER_BP_RESET);
> -  if (!gdbarch_has_shared_address_space (current_inferior ()->arch ()))
> -    delete this->aspace;
> +  reset_aspace ();
>  }
>  
>  /* See progspace.h.  */
> @@ -409,20 +408,19 @@ update_address_spaces (void)
>  
>    init_address_spaces ();
>  
> +  for (struct program_space *pspace : program_spaces)
> +    pspace->reset_aspace ();
> +
>    if (shared_aspace)
>      {
>        struct address_space *aspace = new address_space ();
>  
> -      delete current_program_space->aspace;
>        for (struct program_space *pspace : program_spaces)
> -	pspace->aspace = aspace;
> +	pspace->set_aspace (aspace);
>      }
>    else
>      for (struct program_space *pspace : program_spaces)
> -      {
> -	delete pspace->aspace;
> -	pspace->aspace = new address_space ();
> -      }
> +      pspace->set_aspace (new address_space ());
>  
>    for (inferior *inf : all_inferiors ())
>      if (gdbarch_has_global_solist (current_inferior ()->arch ()))
> @@ -433,8 +431,27 @@ update_address_spaces (void)
>  
>  \f
>  
> +void
> +program_space::set_aspace (struct address_space *aspace_)
> +{
> +  aspace = aspace_;
> +
> +  aspace->incref ();
> +}
> +
>  /* See progspace.h.  */
>  
> +void
> +program_space::reset_aspace ()
> +{
> +  aspace->decref ();
> +
> +  if (aspace->refcount () == 0)
> +    delete aspace;
> +
> +  aspace = nullptr;
> +}

I wouldn't have expected the reference counting to be done manually like
this.  I would have expected either:

  * Use a std::shared_ptr<address_space> within program_space, and then
    either also hold a std::shared_ptr within the inferior too, or
    potentially loan out raw pointers (to the inferior) using
    std::shared_ptr::get, or

  * Create a new type using gdb::ref_ptr ... but thinking about it,
    given the reference counting policy on this would be pretty vanilla,
    if you had done this, I think I'd be asking why not just use
    std::shared_ptr.

> +
>  void
>  program_space::clear_solib_cache ()
>  {
> diff --git a/gdb/progspace.h b/gdb/progspace.h
> index a22e427400e..065ca38e255 100644
> --- a/gdb/progspace.h
> +++ b/gdb/progspace.h
> @@ -336,6 +336,10 @@ struct program_space
>       make breakpoints global).  */
>    struct address_space *aspace = NULL;
>  
> +  void set_aspace (struct address_space *aspace);
> +
> +  void reset_aspace ();

These should have explanatory comments.  It feels like at a minimum
aspace should be made private, though I really think its type should be
changed to something that manages the reference counting for us.

Thanks,
Andrew

> +
>    /* True if this program space's section offsets don't yet represent
>       the final offsets of the "live" address space (that is, the
>       section addresses still require the relocation offsets to be
> @@ -384,12 +388,17 @@ struct program_space
>  /* An address space.  It is used for comparing if
>     pspaces/inferior/threads see the same address space and for
>     associating caches to each address space.  */
> -struct address_space
> +struct address_space : public refcounted_object
>  {
>    /* Create a new address space object, and add it to the list.  */
>    address_space ();
>    DISABLE_COPY_AND_ASSIGN (address_space);
>  
> +  ~address_space ()
> +  {
> +    gdb_assert (refcount () == 0);
> +  }
> +
>    /* Returns the integer address space id of this address space.  */
>    int num () const
>    {
> -- 
> 2.35.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1
  2023-11-04 15:57 ` [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1 Tom de Vries
  2023-11-06 15:24   ` Andrew Burgess
@ 2023-11-06 17:05   ` Simon Marchi
  2023-11-07 11:16     ` Andrew Burgess
  2023-11-07 13:32     ` Tom de Vries
  1 sibling, 2 replies; 8+ messages in thread
From: Simon Marchi @ 2023-11-06 17:05 UTC (permalink / raw)
  To: Tom de Vries, gdb-patches

On 11/4/23 11:57, Tom de Vries wrote:
> When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise
> on s390x), I run into:
> ...
> (gdb) PASS: gdb.base/vfork-follow-parent.exp: \
>   exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \
>   resolution_method=schedule-multiple: print unblock_parent = 1
> continue^M
> Continuing.^M
> Reading symbols from vfork-follow-parent-exit...^M
> ^M
> ^M
> Fatal signal: Segmentation fault^M
> ----- Backtrace -----^M
> 0x1027d3e7 gdb_internal_backtrace_1^M
>         src/gdb/bt-utils.c:122^M
> 0x1027d54f _Z22gdb_internal_backtracev^M
>         src/gdb/bt-utils.c:168^M
> 0x1057643f handle_fatal_signal^M
>         src/gdb/event-top.c:889^M
> 0x10576677 handle_sigsegv^M
>         src/gdb/event-top.c:962^M
> 0x3fffa7610477 ???^M
> 0x103f2144 for_each_block^M
>         src/gdb/dcache.c:199^M
> 0x103f235b _Z17dcache_invalidateP13dcache_struct^M
>         src/gdb/dcache.c:251^M
> 0x10bde8c7 _Z24target_dcache_invalidatev^M
>         src/gdb/target-dcache.c:50^M
> ...
> or similar.
> 
> The root cause for the segmentation fault is that linux_is_uclinux gives an
> incorrect result: it should always return false, given that we're running on a
> regular linux system, but instead it returns first true, then false.
> 
> In more detail, the segmentation fault happens as follows:
> - a program space with an address space is created
> - a second program space is about to be created. maybe_new_address_space
>   is called, and because linux_is_uclinux returns true, maybe_new_address_space
>   returns false, and no new address space is created
> - a second program space with the same address space is created
> - a program space is deleted. Because linux_is_uclinux now returns false,
>   gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns
>   false, and the address space is deleted
> - when gdb uses the address space of the remaining program space, we run into
>   the segfault, because the address space is deleted.
> 
> Hardcoding linux_is_uclinux to false makes the test-case pass.
> 
> We leave addressing the root cause for the following commit in this series.
> 
> For now, prevent the segmentation fault by making the address space a refcounted
> object.
> 
> This was already suggested here [1]:
> ...
> A better solution might be to have the address spaces be reference counted
> ...

That reminded me of a patch I started to work on a while ago to remove
program_space::aspace, which I have to get back to.  The rationale is
that there might be a 1:N relation ship between one program space and
many address spaces (see diagram in progspace.h), so having this one
pointer does not represent well the relationship.  An inferior is always
bound to a single address space, so my patch changes the code to use
inferior::aspace instead.

I think that even with my patch, we would still have the problem of
knowing when to delete an address space.  When you have many inferiors
sharing an address space, and you delete an inferior, how do you know if
you should delete the address space?  Perhaps we can implement a method
like program_space::empty, but for address_space.

However, I still think that using refcounting for address spaces (and
perhaps for program spaces too?) is a good solution.  It would spare us
of having to deal with the logic of knowing when address space is no
longer used.  Refcounting isn't always the right solution, but I don't
see any downsides in this case.

Implementation question: is there any reason you didn't choose
std::shared_ptr?  For objects that are new'ed / delete'd, that seems
like a good choice.

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1
  2023-11-06 17:05   ` Simon Marchi
@ 2023-11-07 11:16     ` Andrew Burgess
  2023-11-07 13:32     ` Tom de Vries
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Burgess @ 2023-11-07 11:16 UTC (permalink / raw)
  To: Simon Marchi, Tom de Vries, gdb-patches

Simon Marchi <simark@simark.ca> writes:

> On 11/4/23 11:57, Tom de Vries wrote:
>> When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise
>> on s390x), I run into:
>> ...
>> (gdb) PASS: gdb.base/vfork-follow-parent.exp: \
>>   exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \
>>   resolution_method=schedule-multiple: print unblock_parent = 1
>> continue^M
>> Continuing.^M
>> Reading symbols from vfork-follow-parent-exit...^M
>> ^M
>> ^M
>> Fatal signal: Segmentation fault^M
>> ----- Backtrace -----^M
>> 0x1027d3e7 gdb_internal_backtrace_1^M
>>         src/gdb/bt-utils.c:122^M
>> 0x1027d54f _Z22gdb_internal_backtracev^M
>>         src/gdb/bt-utils.c:168^M
>> 0x1057643f handle_fatal_signal^M
>>         src/gdb/event-top.c:889^M
>> 0x10576677 handle_sigsegv^M
>>         src/gdb/event-top.c:962^M
>> 0x3fffa7610477 ???^M
>> 0x103f2144 for_each_block^M
>>         src/gdb/dcache.c:199^M
>> 0x103f235b _Z17dcache_invalidateP13dcache_struct^M
>>         src/gdb/dcache.c:251^M
>> 0x10bde8c7 _Z24target_dcache_invalidatev^M
>>         src/gdb/target-dcache.c:50^M
>> ...
>> or similar.
>> 
>> The root cause for the segmentation fault is that linux_is_uclinux gives an
>> incorrect result: it should always return false, given that we're running on a
>> regular linux system, but instead it returns first true, then false.
>> 
>> In more detail, the segmentation fault happens as follows:
>> - a program space with an address space is created
>> - a second program space is about to be created. maybe_new_address_space
>>   is called, and because linux_is_uclinux returns true, maybe_new_address_space
>>   returns false, and no new address space is created
>> - a second program space with the same address space is created
>> - a program space is deleted. Because linux_is_uclinux now returns false,
>>   gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns
>>   false, and the address space is deleted
>> - when gdb uses the address space of the remaining program space, we run into
>>   the segfault, because the address space is deleted.
>> 
>> Hardcoding linux_is_uclinux to false makes the test-case pass.
>> 
>> We leave addressing the root cause for the following commit in this series.
>> 
>> For now, prevent the segmentation fault by making the address space a refcounted
>> object.
>> 
>> This was already suggested here [1]:
>> ...
>> A better solution might be to have the address spaces be reference counted
>> ...
>
> That reminded me of a patch I started to work on a while ago to remove
> program_space::aspace, which I have to get back to.  The rationale is
> that there might be a 1:N relation ship between one program space and
> many address spaces (see diagram in progspace.h), so having this one
> pointer does not represent well the relationship.  An inferior is always
> bound to a single address space, so my patch changes the code to use
> inferior::aspace instead.

I also started looking at this after our previous discussion on
address_space pointer sharing :)

>
> I think that even with my patch, we would still have the problem of
> knowing when to delete an address space.  When you have many inferiors
> sharing an address space, and you delete an inferior, how do you know if
> you should delete the address space?  Perhaps we can implement a method
> like program_space::empty, but for address_space.
>
> However, I still think that using refcounting for address spaces (and
> perhaps for program spaces too?) is a good solution.  It would spare us
> of having to deal with the logic of knowing when address space is no
> longer used.  Refcounting isn't always the right solution, but I don't
> see any downsides in this case.

Agreed.  The biggest yuck for me was finding the address_space to assign
to a new inferior when in the shared address_space case.  Currently, the
program_space serves as the authority that hands out the single shared
address_space.

With the address_space removed from the program_space, you now need to
find a sibling inferior and take a reference to their address space.
Not a show stopping issue by any means, I just never got around to
finishing this part of the task.

Thanks,
Andrew

>
> Implementation question: is there any reason you didn't choose
> std::shared_ptr?  For objects that are new'ed / delete'd, that seems
> like a good choice.
>
> Simon


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1
  2023-11-06 15:24   ` Andrew Burgess
@ 2023-11-07 13:28     ` Tom de Vries
  0 siblings, 0 replies; 8+ messages in thread
From: Tom de Vries @ 2023-11-07 13:28 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

On 11/6/23 16:24, Andrew Burgess wrote:
> Tom de Vries <tdevries@suse.de> writes:
> 
>> When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise
>> on s390x), I run into:
>> ...
>> (gdb) PASS: gdb.base/vfork-follow-parent.exp: \
>>    exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \
>>    resolution_method=schedule-multiple: print unblock_parent = 1
>> continue^M
>> Continuing.^M
>> Reading symbols from vfork-follow-parent-exit...^M
>> ^M
>> ^M
>> Fatal signal: Segmentation fault^M
>> ----- Backtrace -----^M
>> 0x1027d3e7 gdb_internal_backtrace_1^M
>>          src/gdb/bt-utils.c:122^M
>> 0x1027d54f _Z22gdb_internal_backtracev^M
>>          src/gdb/bt-utils.c:168^M
>> 0x1057643f handle_fatal_signal^M
>>          src/gdb/event-top.c:889^M
>> 0x10576677 handle_sigsegv^M
>>          src/gdb/event-top.c:962^M
>> 0x3fffa7610477 ???^M
>> 0x103f2144 for_each_block^M
>>          src/gdb/dcache.c:199^M
>> 0x103f235b _Z17dcache_invalidateP13dcache_struct^M
>>          src/gdb/dcache.c:251^M
>> 0x10bde8c7 _Z24target_dcache_invalidatev^M
>>          src/gdb/target-dcache.c:50^M
>> ...
>> or similar.
>>
>> The root cause for the segmentation fault is that linux_is_uclinux gives an
>> incorrect result: it should always return false, given that we're running on a
>> regular linux system, but instead it returns first true, then false.
>>
>> In more detail, the segmentation fault happens as follows:
>> - a program space with an address space is created
>> - a second program space is about to be created. maybe_new_address_space
>>    is called, and because linux_is_uclinux returns true, maybe_new_address_space
>>    returns false, and no new address space is created
>> - a second program space with the same address space is created
>> - a program space is deleted. Because linux_is_uclinux now returns false,
>>    gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns
>>    false, and the address space is deleted
>> - when gdb uses the address space of the remaining program space, we run into
>>    the segfault, because the address space is deleted.
>>
>> Hardcoding linux_is_uclinux to false makes the test-case pass.
>>
>> We leave addressing the root cause for the following commit in this series.
>>
>> For now, prevent the segmentation fault by making the address space a refcounted
>> object.
>>
>> This was already suggested here [1]:
>> ...
>> A better solution might be to have the address spaces be reference counted
>> ...
>>
>> Tested on top of trunk on x86_64-linux and ppc64le-linux.
>> Tested on top of gdb-14-branch on ppc64-linux.
>>
>> PR gdb/30547
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547
>>
>> [1] https://sourceware.org/pipermail/gdb-patches/2023-October/202928.html
>> ---
>>   gdb/progspace.c | 37 +++++++++++++++++++++++++++----------
>>   gdb/progspace.h | 11 ++++++++++-
>>   2 files changed, 37 insertions(+), 11 deletions(-)
>>
>> diff --git a/gdb/progspace.c b/gdb/progspace.c
>> index 839707e9d71..4fea21f0ca1 100644
>> --- a/gdb/progspace.c
>> +++ b/gdb/progspace.c
>> @@ -96,9 +96,9 @@ remove_program_space (program_space *pspace)
>>   /* See progspace.h.  */
>>   
>>   program_space::program_space (address_space *aspace_)
>> -  : num (++last_program_space_num),
>> -    aspace (aspace_)
>> +  : num (++last_program_space_num)
>>   {
>> +  set_aspace (aspace_);
>>     program_spaces.push_back (this);
>>     gdb::observers::new_program_space.notify (this);
>>   }
>> @@ -122,8 +122,7 @@ program_space::~program_space ()
>>     /* Defer breakpoint re-set because we don't want to create new
>>        locations for this pspace which we're tearing down.  */
>>     clear_symtab_users (SYMFILE_DEFER_BP_RESET);
>> -  if (!gdbarch_has_shared_address_space (current_inferior ()->arch ()))
>> -    delete this->aspace;
>> +  reset_aspace ();
>>   }
>>   
>>   /* See progspace.h.  */
>> @@ -409,20 +408,19 @@ update_address_spaces (void)
>>   
>>     init_address_spaces ();
>>   
>> +  for (struct program_space *pspace : program_spaces)
>> +    pspace->reset_aspace ();
>> +
>>     if (shared_aspace)
>>       {
>>         struct address_space *aspace = new address_space ();
>>   
>> -      delete current_program_space->aspace;
>>         for (struct program_space *pspace : program_spaces)
>> -	pspace->aspace = aspace;
>> +	pspace->set_aspace (aspace);
>>       }
>>     else
>>       for (struct program_space *pspace : program_spaces)
>> -      {
>> -	delete pspace->aspace;
>> -	pspace->aspace = new address_space ();
>> -      }
>> +      pspace->set_aspace (new address_space ());
>>   
>>     for (inferior *inf : all_inferiors ())
>>       if (gdbarch_has_global_solist (current_inferior ()->arch ()))
>> @@ -433,8 +431,27 @@ update_address_spaces (void)
>>   
>>   \f
>>   
>> +void
>> +program_space::set_aspace (struct address_space *aspace_)
>> +{
>> +  aspace = aspace_;
>> +
>> +  aspace->incref ();
>> +}
>> +
>>   /* See progspace.h.  */
>>   
>> +void
>> +program_space::reset_aspace ()
>> +{
>> +  aspace->decref ();
>> +
>> +  if (aspace->refcount () == 0)
>> +    delete aspace;
>> +
>> +  aspace = nullptr;
>> +}
> 
> I wouldn't have expected the reference counting to be done manually like
> this.  I would have expected either:
> 
>    * Use a std::shared_ptr<address_space> within program_space, and then
>      either also hold a std::shared_ptr within the inferior too, or
>      potentially loan out raw pointers (to the inferior) using
>      std::shared_ptr::get, or
> 

Hi Andrew,

thanks for the review.

I've submitted a v2 ( 
https://sourceware.org/pipermail/gdb-patches/2023-November/203850.html ) 
that uses std::shared_ptr for the address_space for both the 
program_space and the inferior.

>    * Create a new type using gdb::ref_ptr ... but thinking about it,
>      given the reference counting policy on this would be pretty vanilla,
>      if you had done this, I think I'd be asking why not just use
>      std::shared_ptr.
> 
>> +
>>   void
>>   program_space::clear_solib_cache ()
>>   {
>> diff --git a/gdb/progspace.h b/gdb/progspace.h
>> index a22e427400e..065ca38e255 100644
>> --- a/gdb/progspace.h
>> +++ b/gdb/progspace.h
>> @@ -336,6 +336,10 @@ struct program_space
>>        make breakpoints global).  */
>>     struct address_space *aspace = NULL;
>>   
>> +  void set_aspace (struct address_space *aspace);
>> +
>> +  void reset_aspace ();
> 
> These should have explanatory comments. 

These two functions are gone in the v2.

> It feels like at a minimum
> aspace should be made private, though I really think its type should be
> changed to something that manages the reference counting for us.
> 

In the v2 I've left aspace non-private.

I could add a follow-up patch that makes a aspace private, but for 
backporting purposes I'd like to keep this patch a simple as possible.

Thanks,
- Tom

> Thanks,
> Andrew
> 
>> +
>>     /* True if this program space's section offsets don't yet represent
>>        the final offsets of the "live" address space (that is, the
>>        section addresses still require the relocation offsets to be
>> @@ -384,12 +388,17 @@ struct program_space
>>   /* An address space.  It is used for comparing if
>>      pspaces/inferior/threads see the same address space and for
>>      associating caches to each address space.  */
>> -struct address_space
>> +struct address_space : public refcounted_object
>>   {
>>     /* Create a new address space object, and add it to the list.  */
>>     address_space ();
>>     DISABLE_COPY_AND_ASSIGN (address_space);
>>   
>> +  ~address_space ()
>> +  {
>> +    gdb_assert (refcount () == 0);
>> +  }
>> +
>>     /* Returns the integer address space id of this address space.  */
>>     int num () const
>>     {
>> -- 
>> 2.35.3
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1
  2023-11-06 17:05   ` Simon Marchi
  2023-11-07 11:16     ` Andrew Burgess
@ 2023-11-07 13:32     ` Tom de Vries
  1 sibling, 0 replies; 8+ messages in thread
From: Tom de Vries @ 2023-11-07 13:32 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 11/6/23 18:05, Simon Marchi wrote:
> Implementation question: is there any reason you didn't choose 
> std::shared_ptr? For objects that are new'ed / delete'd, that seems like 
> a good choice.

No particular reason other than that I'm unfamiliar with std::shared_ptr.

In the v2 ( 
https://sourceware.org/pipermail/gdb-patches/2023-November/203850.html ) 
I've used std::shared_ptr.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-11-07 13:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-04 15:57 [PATCH 0/2] [gdb] Fix segfault in for_each_block Tom de Vries
2023-11-04 15:57 ` [PATCH 1/2] [gdb] Fix segfault in for_each_block, part 1 Tom de Vries
2023-11-06 15:24   ` Andrew Burgess
2023-11-07 13:28     ` Tom de Vries
2023-11-06 17:05   ` Simon Marchi
2023-11-07 11:16     ` Andrew Burgess
2023-11-07 13:32     ` Tom de Vries
2023-11-04 15:57 ` [PATCH 2/2] [gdb] Fix segfault in for_each_block, part 2 Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).