public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [RFC PATCH] gdb/frame: Refine stack switch heuristic
@ 2021-12-20  9:36 Keno Fischer
  2022-03-11 20:17 ` Tom Tromey
  0 siblings, 1 reply; 2+ messages in thread
From: Keno Fischer @ 2021-12-20  9:36 UTC (permalink / raw)
  To: gdb-patches

GDB includes a heuristic that terminates unwinding if GDB comes across
a stack frame transition that looks like it's going in the wrong
direction. Of course, in an application with perfect unwind info, such
a heuristic should not be required, but alas in such a world we live not.

Quite inconveniently, there are libraries that do cause the stack
to look as if its going in the wrong direction, for legitimate reasons,
e.g. because they're implementing segmented stacks or because they're
doing some sort of instrumentation that should run on a separate stack.
The trouble with this heuristic is that even if these libraries declare
their CFI carefully and perfectly, gdb will still terminate unwinding.

There is currently one workaround to disable this heuristic: Because
stack switching is used in the gcc segmented stack implementation,
the heuristic inside gdb is disabled if the function happens to be
called __morestack (which is the name of the gcc function
implementing the stack segmentation). Because of this, a host of
applications have started naming their functions __morestack in
an attempt to get gdb to disable this heuristic (e.g. [1][2]).

I was about to add this workaround to another application, but I
figured it might be worth seeing if we can do better in gdb instead.
The workaround works, but having a bunch of unrelated non-descript
__morestack functions in the backtrace, rather than a more useful
indication of why the stack is switching is not a great debugging
experience. It also complicates debugging of applications that have
CFI information available (because it was in memory when the
application crashed), but no symbols.

Here I propose the following extension to GDB's heuristic: If the
CFA of a frame is declared using DW_CFA_def_cfa_expression, i.e.
is a DWARF expression, then gdb disables the heuristic, because
clearly something fancy is going on with the stacks. In most cases
complicated stack switchers will likely be using this form of
the CFI since the computation of the CFA is non-trivial after
the stack has been switched to the new location (likely involving
reading a frame on the new stack to determine the old rsp). However,
even if the CFA computation is not as complicated (e.g. because
the stack switcher happens to save the CFA in a register), this
gives applications an easy low-overhead workaround to mark their
functions as a stack switcher without having to rename their symbols.

I think this is a reasonable tradeoff. DW_CFA_def_cfa_expression is
reasonably rare outside of code that does messy things with stack
computation, so it doesn't defang the heuristic while giving a clear
rule for when the heuristic is disabled. An alternatively could
be a more explicit way to mark stack switchers in the CFI that
does not involve requiring a symbol lookup, but that would probably
require more coordination with the standards authors and CFI
producers, which doesn't seem worthwhile at this juncture.

[1] https://github.com/rr-debugger/rr/blob/1a774d0b5946a19b291756092be699d5295c0dc1/src/preload/syscall_hook.S#L370-L382
[2] https://github.com/edef1c/libfringe/blob/master/src/arch/x86_64.rs#L64-L68

Signed-off-by: Keno Fischer <keno@juliacomputing.com>
---
 gdb/dwarf2/frame.c    | 13 ++++++++++++-
 gdb/frame.c           | 26 +++++++++++++++++++++++---
 gdb/frame.h           | 19 +++++++++++++++++--
 gdb/guile/scm-frame.c |  2 +-
 4 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/gdb/dwarf2/frame.c b/gdb/dwarf2/frame.c
index f3d38771708..832c4a51088 100644
--- a/gdb/dwarf2/frame.c
+++ b/gdb/dwarf2/frame.c
@@ -846,6 +846,10 @@ struct dwarf2_frame_cache
   /* Set if the return address column was marked as undefined.  */
   int undefined_retaddr;
 
+  /* Set if we heuristically think that this function is a stack
+     switcher */
+  int stack_switch_heuristic;
+
   /* Saved registers, indexed by GDB register number, not by DWARF
      register number.  */
   struct dwarf2_frame_state_reg *reg;
@@ -973,6 +977,10 @@ dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
 	    execute_stack_op (fs.regs.cfa_exp, fs.regs.cfa_exp_len,
 			      cache->addr_size, this_frame, 0, 0,
 			      cache->per_objfile);
+    /* If the the CFI went through the trouble of specifying an
+       expression for the CFA, trust that the CFI here is correct
+       and a stack switch is potentially intended. */
+    cache->stack_switch_heuristic = 1;
 	  break;
 
 	default:
@@ -1123,6 +1131,8 @@ dwarf2_frame_this_id (struct frame_info *this_frame, void **this_cache,
     (*this_id) = frame_id_build_unavailable_stack (get_frame_func (this_frame));
   else if (cache->undefined_retaddr)
     return;
+  else if (cache->stack_switch_heuristic)
+    (*this_id) = frame_id_build_stack_switch (cache->cfa, get_frame_func (this_frame));
   else
     (*this_id) = frame_id_build (cache->cfa, get_frame_func (this_frame));
 }
@@ -1360,7 +1370,8 @@ dwarf2_frame_cfa (struct frame_info *this_frame)
 		_("can't compute CFA for this frame: "
 		  "required registers or memory are unavailable"));
 
-  if (get_frame_id (this_frame).stack_status != FID_STACK_VALID)
+  if (get_frame_id (this_frame).stack_status != FID_STACK_VALID &&
+      get_frame_id (this_frame).stack_status != FID_ALLOW_STACK_SWITCH)
     throw_error (NOT_AVAILABLE_ERROR,
 		_("can't compute CFA for this frame: "
 		  "frame base not available"));
diff --git a/gdb/frame.c b/gdb/frame.c
index 896d80d87bf..9bf7517f40c 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -226,11 +226,13 @@ frame_addr_hash (const void *ap)
   const struct frame_id f_id = frame->this_id.value;
   hashval_t hash = 0;
 
-  gdb_assert (f_id.stack_status != FID_STACK_INVALID
+  gdb_assert ((f_id.stack_status != FID_STACK_INVALID &&
+               f_id.stack_status != FID_ALLOW_STACK_SWITCH)
 	      || f_id.code_addr_p
 	      || f_id.special_addr_p);
 
-  if (f_id.stack_status == FID_STACK_VALID)
+  if (f_id.stack_status == FID_STACK_VALID ||
+      f_id.stack_status == FID_ALLOW_STACK_SWITCH)
     hash = iterative_hash (&f_id.stack_addr,
 			   sizeof (f_id.stack_addr), hash);
   if (f_id.code_addr_p)
@@ -721,6 +723,18 @@ frame_id_build (CORE_ADDR stack_addr, CORE_ADDR code_addr)
   return id;
 }
 
+struct frame_id
+frame_id_build_stack_switch (CORE_ADDR stack_addr, CORE_ADDR code_addr)
+{
+  struct frame_id id = null_frame_id;
+
+  id.stack_addr = stack_addr;
+  id.stack_status = FID_ALLOW_STACK_SWITCH;
+  id.code_addr = code_addr;
+  id.code_addr_p = true;
+  return id;
+}
+
 struct frame_id
 frame_id_build_wild (CORE_ADDR stack_addr)
 {
@@ -823,7 +837,13 @@ frame_id_inner (struct gdbarch *gdbarch, struct frame_id l, struct frame_id r)
 {
   bool inner;
 
-  if (l.stack_status != FID_STACK_VALID || r.stack_status != FID_STACK_VALID)
+  /**
+   * This heuristic is disabled for FID_ALLOW_STACK_SWITCH -> FID_STACK_VALID,
+   * but we do check the opposite order. */
+  if (l.stack_status == FID_ALLOW_STACK_SWITCH && r.stack_status == FID_STACK_VALID)
+    inner = false;
+  else if ((l.stack_status != FID_STACK_VALID && l.stack_status != FID_ALLOW_STACK_SWITCH)
+        || (r.stack_status != FID_STACK_VALID && r.stack_status != FID_ALLOW_STACK_SWITCH))
     /* Like NaN, any operation involving an invalid ID always fails.
        Likewise if either ID has an unavailable stack address.  */
     inner = false;
diff --git a/gdb/frame.h b/gdb/frame.h
index 2548846c1ed..1adec3a41d3 100644
--- a/gdb/frame.h
+++ b/gdb/frame.h
@@ -101,6 +101,14 @@ enum frame_id_stack_status
      outer frame are also of this type.  */
   FID_STACK_OUTER = 3,
 
+  /* Stack switch frame. This is heuristically inferred from the CFI of the
+     frame in question and disables stack layout heuristics that would otherwise
+     terminate unwinding. Note that the tagged frame's CFA is expected to be
+     inner to its caller (i.e. the stack switching happens during the tagged
+     function and its callee may have a new CFA). Otherwise equivalent to
+     FID_STACK_VALID. */
+  FID_ALLOW_STACK_SWITCH = 4,
+
   /* Stack address is unavailable.  I.e., there's a valid stack, but
      we don't know where it is (because memory or registers we'd
      compute it from were not collected).  */
@@ -129,7 +137,7 @@ struct frame_id
      wrong.
 
      This field is valid only if frame_id.stack_status is
-     FID_STACK_VALID.  It will be 0 for other
+     FID_STACK_VALID or FID_ALLOW_STACK_SWITCH.  It will be 0 for other
      FID_STACK_... statuses.  */
   CORE_ADDR stack_addr;
 
@@ -161,7 +169,7 @@ struct frame_id
   CORE_ADDR special_addr;
 
   /* Flags to indicate the above fields have valid contents.  */
-  ENUM_BITFIELD(frame_id_stack_status) stack_status : 3;
+  ENUM_BITFIELD(frame_id_stack_status) stack_status : 4;
   unsigned int code_addr_p : 1;
   unsigned int special_addr_p : 1;
 
@@ -234,6 +242,13 @@ extern bool frame_debug;
 extern struct frame_id frame_id_build (CORE_ADDR stack_addr,
 				       CORE_ADDR code_addr);
 
+/* Construct a stack-switcher frame ID.  The first parameter is the frame's constant
+   stack address (typically the outer-bound), and the second the
+   frame's constant code address (typically the entry point).
+   The special identifier address is set to indicate a wild card.  */
+extern struct frame_id frame_id_build_stack_switch (CORE_ADDR stack_addr,
+				       CORE_ADDR code_addr);
+
 /* Construct a special frame ID.  The first parameter is the frame's constant
    stack address (typically the outer-bound), the second is the
    frame's constant code address (typically the entry point),
diff --git a/gdb/guile/scm-frame.c b/gdb/guile/scm-frame.c
index eb32f9a2ef0..46845ae459e 100644
--- a/gdb/guile/scm-frame.c
+++ b/gdb/guile/scm-frame.c
@@ -87,7 +87,7 @@ frscm_hash_frame_smob (const void *p)
   const struct frame_id *fid = &f_smob->frame_id;
   hashval_t hash = htab_hash_pointer (f_smob->inferior);
 
-  if (fid->stack_status == FID_STACK_VALID)
+  if (fid->stack_status == FID_STACK_VALID || fid->stack_status == FID_ALLOW_STACK_SWITCH)
     hash = iterative_hash (&fid->stack_addr, sizeof (fid->stack_addr), hash);
   if (fid->code_addr_p)
     hash = iterative_hash (&fid->code_addr, sizeof (fid->code_addr), hash);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] gdb/frame: Refine stack switch heuristic
  2021-12-20  9:36 [RFC PATCH] gdb/frame: Refine stack switch heuristic Keno Fischer
@ 2022-03-11 20:17 ` Tom Tromey
  0 siblings, 0 replies; 2+ messages in thread
From: Tom Tromey @ 2022-03-11 20:17 UTC (permalink / raw)
  To: Keno Fischer via Gdb-patches; +Cc: Keno Fischer

>>>>> Keno Fischer via Gdb-patches <gdb-patches@sourceware.org> writes:

Hi.  Thank you for the patch.

> Here I propose the following extension to GDB's heuristic: If the
> CFA of a frame is declared using DW_CFA_def_cfa_expression, i.e.
> is a DWARF expression, then gdb disables the heuristic, because
> clearly something fancy is going on with the stacks.

I suppose I'm ok with this.

> An alternatively could
> be a more explicit way to mark stack switchers in the CFI that
> does not involve requiring a symbol lookup, but that would probably
> require more coordination with the standards authors and CFI
> producers, which doesn't seem worthwhile at this juncture.

Yeah, I wouldn't recommend attempting that.

> +  /* Set if we heuristically think that this function is a stack
> +     switcher */
> +  int stack_switch_heuristic;

Might as well use bool here, then true/false elsewhere.

> +    (*this_id) = frame_id_build_stack_switch (cache->cfa, get_frame_func (this_frame));

Probably have to line-break after the ",".

> +  if (fid->stack_status == FID_STACK_VALID || fid->stack_status == FID_ALLOW_STACK_SWITCH)

Should line-break before the "||".

Is there any reasonable way to write a test case for this?

Tom

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-03-11 20:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20  9:36 [RFC PATCH] gdb/frame: Refine stack switch heuristic Keno Fischer
2022-03-11 20:17 ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).