public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Keno Fischer <keno@juliacomputing.com>
To: gdb-patches@sourceware.org
Subject: [RFC PATCH] gdb/frame: Refine stack switch heuristic
Date: Mon, 20 Dec 2021 04:36:20 -0500	[thread overview]
Message-ID: <20211220093620.GA2968241@juliacomputing.com> (raw)

GDB includes a heuristic that terminates unwinding if GDB comes across
a stack frame transition that looks like it's going in the wrong
direction. Of course, in an application with perfect unwind info, such
a heuristic should not be required, but alas in such a world we live not.

Quite inconveniently, there are libraries that do cause the stack
to look as if its going in the wrong direction, for legitimate reasons,
e.g. because they're implementing segmented stacks or because they're
doing some sort of instrumentation that should run on a separate stack.
The trouble with this heuristic is that even if these libraries declare
their CFI carefully and perfectly, gdb will still terminate unwinding.

There is currently one workaround to disable this heuristic: Because
stack switching is used in the gcc segmented stack implementation,
the heuristic inside gdb is disabled if the function happens to be
called __morestack (which is the name of the gcc function
implementing the stack segmentation). Because of this, a host of
applications have started naming their functions __morestack in
an attempt to get gdb to disable this heuristic (e.g. [1][2]).

I was about to add this workaround to another application, but I
figured it might be worth seeing if we can do better in gdb instead.
The workaround works, but having a bunch of unrelated non-descript
__morestack functions in the backtrace, rather than a more useful
indication of why the stack is switching is not a great debugging
experience. It also complicates debugging of applications that have
CFI information available (because it was in memory when the
application crashed), but no symbols.

Here I propose the following extension to GDB's heuristic: If the
CFA of a frame is declared using DW_CFA_def_cfa_expression, i.e.
is a DWARF expression, then gdb disables the heuristic, because
clearly something fancy is going on with the stacks. In most cases
complicated stack switchers will likely be using this form of
the CFI since the computation of the CFA is non-trivial after
the stack has been switched to the new location (likely involving
reading a frame on the new stack to determine the old rsp). However,
even if the CFA computation is not as complicated (e.g. because
the stack switcher happens to save the CFA in a register), this
gives applications an easy low-overhead workaround to mark their
functions as a stack switcher without having to rename their symbols.

I think this is a reasonable tradeoff. DW_CFA_def_cfa_expression is
reasonably rare outside of code that does messy things with stack
computation, so it doesn't defang the heuristic while giving a clear
rule for when the heuristic is disabled. An alternatively could
be a more explicit way to mark stack switchers in the CFI that
does not involve requiring a symbol lookup, but that would probably
require more coordination with the standards authors and CFI
producers, which doesn't seem worthwhile at this juncture.

[1] https://github.com/rr-debugger/rr/blob/1a774d0b5946a19b291756092be699d5295c0dc1/src/preload/syscall_hook.S#L370-L382
[2] https://github.com/edef1c/libfringe/blob/master/src/arch/x86_64.rs#L64-L68

Signed-off-by: Keno Fischer <keno@juliacomputing.com>
---
 gdb/dwarf2/frame.c    | 13 ++++++++++++-
 gdb/frame.c           | 26 +++++++++++++++++++++++---
 gdb/frame.h           | 19 +++++++++++++++++--
 gdb/guile/scm-frame.c |  2 +-
 4 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/gdb/dwarf2/frame.c b/gdb/dwarf2/frame.c
index f3d38771708..832c4a51088 100644
--- a/gdb/dwarf2/frame.c
+++ b/gdb/dwarf2/frame.c
@@ -846,6 +846,10 @@ struct dwarf2_frame_cache
   /* Set if the return address column was marked as undefined.  */
   int undefined_retaddr;
 
+  /* Set if we heuristically think that this function is a stack
+     switcher */
+  int stack_switch_heuristic;
+
   /* Saved registers, indexed by GDB register number, not by DWARF
      register number.  */
   struct dwarf2_frame_state_reg *reg;
@@ -973,6 +977,10 @@ dwarf2_frame_cache (struct frame_info *this_frame, void **this_cache)
 	    execute_stack_op (fs.regs.cfa_exp, fs.regs.cfa_exp_len,
 			      cache->addr_size, this_frame, 0, 0,
 			      cache->per_objfile);
+    /* If the the CFI went through the trouble of specifying an
+       expression for the CFA, trust that the CFI here is correct
+       and a stack switch is potentially intended. */
+    cache->stack_switch_heuristic = 1;
 	  break;
 
 	default:
@@ -1123,6 +1131,8 @@ dwarf2_frame_this_id (struct frame_info *this_frame, void **this_cache,
     (*this_id) = frame_id_build_unavailable_stack (get_frame_func (this_frame));
   else if (cache->undefined_retaddr)
     return;
+  else if (cache->stack_switch_heuristic)
+    (*this_id) = frame_id_build_stack_switch (cache->cfa, get_frame_func (this_frame));
   else
     (*this_id) = frame_id_build (cache->cfa, get_frame_func (this_frame));
 }
@@ -1360,7 +1370,8 @@ dwarf2_frame_cfa (struct frame_info *this_frame)
 		_("can't compute CFA for this frame: "
 		  "required registers or memory are unavailable"));
 
-  if (get_frame_id (this_frame).stack_status != FID_STACK_VALID)
+  if (get_frame_id (this_frame).stack_status != FID_STACK_VALID &&
+      get_frame_id (this_frame).stack_status != FID_ALLOW_STACK_SWITCH)
     throw_error (NOT_AVAILABLE_ERROR,
 		_("can't compute CFA for this frame: "
 		  "frame base not available"));
diff --git a/gdb/frame.c b/gdb/frame.c
index 896d80d87bf..9bf7517f40c 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -226,11 +226,13 @@ frame_addr_hash (const void *ap)
   const struct frame_id f_id = frame->this_id.value;
   hashval_t hash = 0;
 
-  gdb_assert (f_id.stack_status != FID_STACK_INVALID
+  gdb_assert ((f_id.stack_status != FID_STACK_INVALID &&
+               f_id.stack_status != FID_ALLOW_STACK_SWITCH)
 	      || f_id.code_addr_p
 	      || f_id.special_addr_p);
 
-  if (f_id.stack_status == FID_STACK_VALID)
+  if (f_id.stack_status == FID_STACK_VALID ||
+      f_id.stack_status == FID_ALLOW_STACK_SWITCH)
     hash = iterative_hash (&f_id.stack_addr,
 			   sizeof (f_id.stack_addr), hash);
   if (f_id.code_addr_p)
@@ -721,6 +723,18 @@ frame_id_build (CORE_ADDR stack_addr, CORE_ADDR code_addr)
   return id;
 }
 
+struct frame_id
+frame_id_build_stack_switch (CORE_ADDR stack_addr, CORE_ADDR code_addr)
+{
+  struct frame_id id = null_frame_id;
+
+  id.stack_addr = stack_addr;
+  id.stack_status = FID_ALLOW_STACK_SWITCH;
+  id.code_addr = code_addr;
+  id.code_addr_p = true;
+  return id;
+}
+
 struct frame_id
 frame_id_build_wild (CORE_ADDR stack_addr)
 {
@@ -823,7 +837,13 @@ frame_id_inner (struct gdbarch *gdbarch, struct frame_id l, struct frame_id r)
 {
   bool inner;
 
-  if (l.stack_status != FID_STACK_VALID || r.stack_status != FID_STACK_VALID)
+  /**
+   * This heuristic is disabled for FID_ALLOW_STACK_SWITCH -> FID_STACK_VALID,
+   * but we do check the opposite order. */
+  if (l.stack_status == FID_ALLOW_STACK_SWITCH && r.stack_status == FID_STACK_VALID)
+    inner = false;
+  else if ((l.stack_status != FID_STACK_VALID && l.stack_status != FID_ALLOW_STACK_SWITCH)
+        || (r.stack_status != FID_STACK_VALID && r.stack_status != FID_ALLOW_STACK_SWITCH))
     /* Like NaN, any operation involving an invalid ID always fails.
        Likewise if either ID has an unavailable stack address.  */
     inner = false;
diff --git a/gdb/frame.h b/gdb/frame.h
index 2548846c1ed..1adec3a41d3 100644
--- a/gdb/frame.h
+++ b/gdb/frame.h
@@ -101,6 +101,14 @@ enum frame_id_stack_status
      outer frame are also of this type.  */
   FID_STACK_OUTER = 3,
 
+  /* Stack switch frame. This is heuristically inferred from the CFI of the
+     frame in question and disables stack layout heuristics that would otherwise
+     terminate unwinding. Note that the tagged frame's CFA is expected to be
+     inner to its caller (i.e. the stack switching happens during the tagged
+     function and its callee may have a new CFA). Otherwise equivalent to
+     FID_STACK_VALID. */
+  FID_ALLOW_STACK_SWITCH = 4,
+
   /* Stack address is unavailable.  I.e., there's a valid stack, but
      we don't know where it is (because memory or registers we'd
      compute it from were not collected).  */
@@ -129,7 +137,7 @@ struct frame_id
      wrong.
 
      This field is valid only if frame_id.stack_status is
-     FID_STACK_VALID.  It will be 0 for other
+     FID_STACK_VALID or FID_ALLOW_STACK_SWITCH.  It will be 0 for other
      FID_STACK_... statuses.  */
   CORE_ADDR stack_addr;
 
@@ -161,7 +169,7 @@ struct frame_id
   CORE_ADDR special_addr;
 
   /* Flags to indicate the above fields have valid contents.  */
-  ENUM_BITFIELD(frame_id_stack_status) stack_status : 3;
+  ENUM_BITFIELD(frame_id_stack_status) stack_status : 4;
   unsigned int code_addr_p : 1;
   unsigned int special_addr_p : 1;
 
@@ -234,6 +242,13 @@ extern bool frame_debug;
 extern struct frame_id frame_id_build (CORE_ADDR stack_addr,
 				       CORE_ADDR code_addr);
 
+/* Construct a stack-switcher frame ID.  The first parameter is the frame's constant
+   stack address (typically the outer-bound), and the second the
+   frame's constant code address (typically the entry point).
+   The special identifier address is set to indicate a wild card.  */
+extern struct frame_id frame_id_build_stack_switch (CORE_ADDR stack_addr,
+				       CORE_ADDR code_addr);
+
 /* Construct a special frame ID.  The first parameter is the frame's constant
    stack address (typically the outer-bound), the second is the
    frame's constant code address (typically the entry point),
diff --git a/gdb/guile/scm-frame.c b/gdb/guile/scm-frame.c
index eb32f9a2ef0..46845ae459e 100644
--- a/gdb/guile/scm-frame.c
+++ b/gdb/guile/scm-frame.c
@@ -87,7 +87,7 @@ frscm_hash_frame_smob (const void *p)
   const struct frame_id *fid = &f_smob->frame_id;
   hashval_t hash = htab_hash_pointer (f_smob->inferior);
 
-  if (fid->stack_status == FID_STACK_VALID)
+  if (fid->stack_status == FID_STACK_VALID || fid->stack_status == FID_ALLOW_STACK_SWITCH)
     hash = iterative_hash (&fid->stack_addr, sizeof (fid->stack_addr), hash);
   if (fid->code_addr_p)
     hash = iterative_hash (&fid->code_addr, sizeof (fid->code_addr), hash);
-- 
2.25.1


             reply	other threads:[~2021-12-20  9:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-20  9:36 Keno Fischer [this message]
2022-03-11 20:17 ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211220093620.GA2968241@juliacomputing.com \
    --to=keno@juliacomputing.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).