public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Simon Marchi <simon.marchi@polymtl.ca>
To: Zoran Zaric <zoran.zaric@amd.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH v3 01/17] Replace the symbol needs evaluator with a parser
Date: Mon, 7 Jun 2021 17:48:14 -0400	[thread overview]
Message-ID: <395847ba-8138-018f-f618-528d1a22e9fa@polymtl.ca> (raw)
In-Reply-To: <20210528154648.60881-2-zoran.zaric@amd.com>

Hi Zoran,

I left some comments, but to go a bit faster I implemented the changes
in a fixup patch that I pasted below.  If you are fine with the changes,
you can just integrate them in your patch.

> diff --git a/gdb/dwarf2/loc.c b/gdb/dwarf2/loc.c
> index e816f926696..2fa68ff0426 100644
> --- a/gdb/dwarf2/loc.c
> +++ b/gdb/dwarf2/loc.c
> @@ -2736,151 +2736,430 @@ dwarf2_compile_property_to_c (string_file *stream,
>  			     data, data + size, per_cu, per_objfile);
>  }
>  
> -\f
> -/* Helper functions and baton for dwarf2_loc_desc_get_symbol_read_needs.  */
> +/* Compute the correct symbol_needs_kind value for the location
> +   expression in EXPR.  */
>  
> -class symbol_needs_eval_context : public dwarf_expr_context
> +static enum symbol_needs_kind
> +dwarf2_get_symbol_read_needs (gdb::array_view<const gdb_byte> expr,
> +			     dwarf2_per_cu_data *per_cu,
> +			     dwarf2_per_objfile *per_objfile,
> +			     bfd_endian byte_order,
> +			     int addr_size,
> +			     int ref_addr_size,
> +			     int depth = 0)
>  {
> -public:
> -  symbol_needs_eval_context (dwarf2_per_objfile *per_objfile)
> -    : dwarf_expr_context (per_objfile)
> -  {}
> +  const gdb_byte *expr_end = expr.data () + expr.size ();
> +  enum symbol_needs_kind symbol_needs = SYMBOL_NEEDS_NONE;
> +  const int max_depth = 256;
> +  std::vector <const gdb_byte *> next_op;
> +  std::set <const gdb_byte *> visited_op;

"next_op" and "visited_op" should be plural (next_ops and
visited_ops).  Although I'd renamed next_ops to ops_to_visit, I find
that a bit clearer.

Please add comments for the important variables here to say what they
are for (at least the set and the vector).

> -  enum symbol_needs_kind needs;
> -  struct dwarf2_per_cu_data *per_cu;
> +  if (depth > max_depth)
> +    error (_("DWARF-2 expression error: Loop detected (%d)."), depth);

At first I didn't undersand why this depth checking is needed. Given we
have the visited_ops set, we are sure to converge and not be stuck in an
infinite loop, even if there are loops in the expression.  However,
after reading the code, I now understand that this depth check is for
expressions that call another expression with DW_OP_call*.  So if for
example expression A calls expression B, which calls expression A, we
wouldn't catch it without this depth check.  Is that right?  If so,
maybe add a comment to explain this?

As for the error message, I don't think it's useful to print the depth.
It will just appear as "Loop detected (257)" to the user, which just
looks like a random number to them.  It would be nice to print some
context, like what was being evaluated, or at least which objfile /
compilation unit contains the problematic expression(s), but that
doesn't seem easy.


> -  /* Reads from registers do require a frame.  */
> -  CORE_ADDR read_addr_from_reg (int regnum) override
> -  {
> -    needs = SYMBOL_NEEDS_FRAME;
> -    return 1;
> -  }
> +  depth++;
>  
> -  /* "get_reg_value" callback: Reads from registers do require a
> -     frame.  */
> +  next_op.push_back (expr.data ());
>  
> -  struct value *get_reg_value (struct type *type, int regnum) override
> -  {
> -    needs = SYMBOL_NEEDS_FRAME;
> -    return value_zero (type, not_lval);
> -  }
> +  while (!next_op.empty ())
> +    {
> +      const gdb_byte *op_ptr = next_op.back ();

I'd suggest popping the item out of next_ops and inserting it in
visited_ops right here, because we know we are unconditionally visiting
it now.  And then, insert an op in next_ops only if it's not in
visited_ops.  This way, we know there are only unvisited ops in
next_ops, and each op goes through next_ops exactly once.

>  
> -  /* Reads from memory do not require a frame.  */
> -  void read_mem (gdb_byte *buf, CORE_ADDR addr, size_t len) override
> -  {
> -    memset (buf, 0, len);
> -  }
> +      if (op_ptr >= expr_end
> +	  || visited_op.find(op_ptr) != visited_op.end())

Missing spaces.

> +	{
> +	  next_op.pop_back ();
> +	  continue;
> +	}
>  
> -  /* Frame-relative accesses do require a frame.  */
> -  void get_frame_base (const gdb_byte **start, size_t *length) override
> -  {
> -    static gdb_byte lit0 = DW_OP_lit0;
> +      enum dwarf_location_atom op
> +	= (enum dwarf_location_atom) *op_ptr++;
> +      uint64_t uoffset;
> +      int64_t offset;
>  
> -    *start = &lit0;
> -    *length = 1;
> +      /* The DWARF expression might have a bug causing an infinite
> +	 loop.  In that case, quitting is the only way out.  */
> +      QUIT;
>  
> -    needs = SYMBOL_NEEDS_FRAME;
> -  }
> +      switch (op)
> +	{
> +	case DW_OP_lit0:
> +	case DW_OP_lit1:
> +	case DW_OP_lit2:
> +	case DW_OP_lit3:
> +	case DW_OP_lit4:
> +	case DW_OP_lit5:
> +	case DW_OP_lit6:
> +	case DW_OP_lit7:
> +	case DW_OP_lit8:
> +	case DW_OP_lit9:
> +	case DW_OP_lit10:
> +	case DW_OP_lit11:
> +	case DW_OP_lit12:
> +	case DW_OP_lit13:
> +	case DW_OP_lit14:
> +	case DW_OP_lit15:
> +	case DW_OP_lit16:
> +	case DW_OP_lit17:
> +	case DW_OP_lit18:
> +	case DW_OP_lit19:
> +	case DW_OP_lit20:
> +	case DW_OP_lit21:
> +	case DW_OP_lit22:
> +	case DW_OP_lit23:
> +	case DW_OP_lit24:
> +	case DW_OP_lit25:
> +	case DW_OP_lit26:
> +	case DW_OP_lit27:
> +	case DW_OP_lit28:
> +	case DW_OP_lit29:
> +	case DW_OP_lit30:
> +	case DW_OP_lit31:
> +	case DW_OP_stack_value:

At first, I thought that when encountering DW_OP_stack_value, we should
not push the following op to the list of ops to visit, because the spec
says:

    The DW_OP_stack_value operation terminates the expression.

That sounds to me like when encountering this, you stop evaluating and
whatever is on the stack is the value.  But there are cases where a
DW_OP_stack_value is in the middle of the expression:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.dwarf2/var-access.exp;h=7ccec20db6d877c2defe980bbf4acfe874249624;hb=HEAD#l240

So when I tried to implement that, it broke this test.

> +	case DW_OP_call2:
> +	  {
> +	    cu_offset cu_off
> +	      = (cu_offset) extract_unsigned_integer (op_ptr, 2, byte_order);
> +	    op_ptr += 2;
> +
> +	    auto get_frame_pc = [&symbol_needs] ()
> +	      {
> +		symbol_needs = SYMBOL_NEEDS_FRAME;
> +		return 0;
> +	      };
> +
> +	    struct dwarf2_locexpr_baton baton
> +	      = dwarf2_fetch_die_loc_cu_off (cu_off, per_cu,
> +					     per_objfile,
> +					     get_frame_pc);
>  
> -  ctx.eval (data, size);
> +	    /* If SYMBOL_NEEDS_FRAME is returned from the previous call,
> +	       we dont have to check the baton content.  */
> +	    if (symbol_needs != SYMBOL_NEEDS_FRAME)
> +	      {
> +		gdbarch *arch = baton.per_objfile->objfile->arch ();
> +		gdb::array_view<const gdb_byte> sub_expr (baton.data,
> +							  baton.size);
> +		symbol_needs
> +		  = dwarf2_get_symbol_read_needs (sub_expr,
> +						  baton.per_cu,
> +						  baton.per_objfile,
> +						  gdbarch_byte_order (arch),
> +						  baton.per_cu->addr_size (),
> +						  baton.per_cu->ref_addr_size (),
> +						  depth);
> +	      }
> +	    break;
> +	  }
> +
> +	case DW_OP_call4:

Since these two cases look alike, I'd suggest merging them.

I forgot to mention above, but I'd also suggest declaring uoffset and
offset in the specific cases that need it.  And you can use
safe_skip_leb128 to avoid much of their uses.

And the fixup patch:


From 5ed60b518e01a5ac7ec4e96176cb110078df2ff1 Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@polymtl.ca>
Date: Mon, 7 Jun 2021 17:05:20 -0400
Subject: [PATCH] Patch 1 proposed fixups

Change-Id: Id6f57ebc96cd74b95465532ce2048b2c28de8642
---
 gdb/dwarf2/loc.c | 178 ++++++++++++++++++++++++-----------------------
 1 file changed, 91 insertions(+), 87 deletions(-)

diff --git a/gdb/dwarf2/loc.c b/gdb/dwarf2/loc.c
index 9f84766d1d2f..802b4457cec1 100644
--- a/gdb/dwarf2/loc.c
+++ b/gdb/dwarf2/loc.c
@@ -2772,45 +2772,76 @@ dwarf2_compile_property_to_c (string_file *stream,
 }
 
 /* Compute the correct symbol_needs_kind value for the location
-   expression in EXPR.  */
+   expression in EXPR.
 
-static enum symbol_needs_kind
+   Implemented by traversing the logical control flow graph of the
+   expression.  */
+
+static symbol_needs_kind
 dwarf2_get_symbol_read_needs (gdb::array_view<const gdb_byte> expr,
-			     dwarf2_per_cu_data *per_cu,
-			     dwarf2_per_objfile *per_objfile,
-			     bfd_endian byte_order,
-			     int addr_size,
-			     int ref_addr_size,
-			     int depth = 0)
+			      dwarf2_per_cu_data *per_cu,
+			      dwarf2_per_objfile *per_objfile,
+			      bfd_endian byte_order,
+			      int addr_size,
+			      int ref_addr_size,
+			      int depth = 0)
 {
-  const gdb_byte *expr_end = expr.data () + expr.size ();
   enum symbol_needs_kind symbol_needs = SYMBOL_NEEDS_NONE;
-  const int max_depth = 256;
-  std::vector <const gdb_byte *> next_op;
-  std::set <const gdb_byte *> visited_op;
 
+  /* If the expression is empty, we have nothing to do.  */
+  if (expr.empty ())
+    return symbol_needs;
+
+  const gdb_byte *expr_end = &expr[0] + expr.size ();
+
+  /* List of operations to visit.  Operations in this list are not visited yet,
+     so are not in VISITED_OPS (and vice-versa).  */
+  std::vector<const gdb_byte *> ops_to_visit;
+
+  /* Operations already visited.  */
+  std::unordered_set<const gdb_byte *> visited_ops;
+
+  /* Insert OP in OPS_TO_VISIT if it is within the expression's range and
+     hasn't been visited yet.  */
+  auto insert_in_ops_to_visit
+    = [expr_end, &visited_ops, &ops_to_visit] (const gdb_byte *op_ptr)
+      {
+	if (op_ptr >= expr_end)
+	  return;
+
+	if (visited_ops.find (op_ptr) != visited_ops.end ())
+	  return;
+
+	ops_to_visit.push_back (op_ptr);
+      };
+
+  /* Expressions can invoke other expressions with DW_OP_call*.  Protect against
+     a loop of calls.  */
+  const int max_depth = 256;
   if (depth > max_depth)
-    error (_("DWARF-2 expression error: Loop detected (%d)."), depth);
+    error (_("DWARF-2 expression error: Loop detected."));
 
   depth++;
 
-  next_op.push_back (expr.data ());
+  /* Initialize the to-visit list with the first operation.  */
+  insert_in_ops_to_visit (&expr[0]);
 
-  while (!next_op.empty ())
+  while (!ops_to_visit.empty ())
     {
-      const gdb_byte *op_ptr = next_op.back ();
+      /* Pop one op to visit, mark it as visited.  */
+      const gdb_byte *op_ptr = ops_to_visit.back ();
+      ops_to_visit.pop_back ();
+      gdb_assert (visited_ops.find (op_ptr) == visited_ops.end ());
+      visited_ops.insert (op_ptr);
 
-      if (op_ptr >= expr_end
-	  || visited_op.find(op_ptr) != visited_op.end())
-	{
-	  next_op.pop_back ();
-	  continue;
-	}
+      dwarf_location_atom op = (dwarf_location_atom) *op_ptr;
 
-      enum dwarf_location_atom op
-	= (enum dwarf_location_atom) *op_ptr++;
-      uint64_t uoffset;
-      int64_t offset;
+      /* Most operations have a single possible following operation (they are
+         not conditional branches).  The code below updates OP_PTR to point to
+	 that following operation, which is pushed back to OPS_TO_VISIT, if
+	 needed, at the bottom.  Here, leave OP_PTR pointing just after the
+	 operand.  */
+      op_ptr++;
 
       /* The DWARF expression might have a bug causing an infinite
 	 loop.  In that case, quitting is the only way out.  */
@@ -2898,22 +2929,22 @@ dwarf2_get_symbol_read_needs (gdb::array_view<const gdb_byte> expr,
 	case DW_OP_constu:
 	case DW_OP_plus_uconst:
 	case DW_OP_piece:
-	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  op_ptr = safe_skip_leb128 (op_ptr, expr_end);
 	  break;
 
 	case DW_OP_consts:
-	  op_ptr = safe_read_sleb128 (op_ptr, expr_end, &offset);
+	  op_ptr = safe_skip_leb128 (op_ptr, expr_end);
 	  break;
 
 	case DW_OP_bit_piece:
-	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
-	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  op_ptr = safe_skip_leb128 (op_ptr, expr_end);
+	  op_ptr = safe_skip_leb128 (op_ptr, expr_end);
 	  break;
 
 	case DW_OP_deref_type:
 	case DW_OP_GNU_deref_type:
 	  op_ptr++;
-	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	  op_ptr = safe_skip_leb128 (op_ptr, expr_end);
 	  break;
 
 	case DW_OP_addr:
@@ -3017,14 +3048,17 @@ dwarf2_get_symbol_read_needs (gdb::array_view<const gdb_byte> expr,
 	  break;
 
 	case DW_OP_implicit_value:
-	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
-	  op_ptr += uoffset;
+	  {
+	    uint64_t uoffset;
+	    op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	    op_ptr += uoffset;
+	  }
 	  break;
 
 	case DW_OP_implicit_pointer:
 	case DW_OP_GNU_implicit_pointer:
 	  op_ptr += ref_addr_size;
-	  op_ptr = safe_read_sleb128 (op_ptr, expr_end, &offset);
+	  op_ptr = safe_skip_leb128 (op_ptr, expr_end);
 	  break;
 
 	case DW_OP_deref_size:
@@ -3033,62 +3067,31 @@ dwarf2_get_symbol_read_needs (gdb::array_view<const gdb_byte> expr,
 	  break;
 
 	case DW_OP_skip:
-	  offset = extract_signed_integer (op_ptr, 2, byte_order);
-	  op_ptr += 2;
-	  op_ptr += offset;
+	  {
+	    LONGEST offset = extract_signed_integer (op_ptr, 2, byte_order);
+	    op_ptr += 2;
+	    op_ptr += offset;
+	  }
 	  break;
 
 	case DW_OP_bra:
-	  visited_op.insert (next_op.back ());
-	  next_op.pop_back ();
-
-	  offset = extract_signed_integer (op_ptr, 2, byte_order);
-	  op_ptr += 2;
-	  next_op.push_back (op_ptr + offset);
-	  next_op.push_back (op_ptr);
-	  continue;
-
-	case DW_OP_call2:
 	  {
-	    cu_offset cu_off
-	      = (cu_offset) extract_unsigned_integer (op_ptr, 2, byte_order);
+	    /* This is the only operation that pushes two operations in the
+	       to-visit list, so handle it all here.  */
+	    LONGEST offset = extract_signed_integer (op_ptr, 2, byte_order);
 	    op_ptr += 2;
-
-	    auto get_frame_pc = [&symbol_needs] ()
-	      {
-		symbol_needs = SYMBOL_NEEDS_FRAME;
-		return 0;
-	      };
-
-	    struct dwarf2_locexpr_baton baton
-	      = dwarf2_fetch_die_loc_cu_off (cu_off, per_cu,
-					     per_objfile,
-					     get_frame_pc);
-
-	    /* If SYMBOL_NEEDS_FRAME is returned from the previous call,
-	       we dont have to check the baton content.  */
-	    if (symbol_needs != SYMBOL_NEEDS_FRAME)
-	      {
-		gdbarch *arch = baton.per_objfile->objfile->arch ();
-		gdb::array_view<const gdb_byte> sub_expr (baton.data,
-							  baton.size);
-		symbol_needs
-		  = dwarf2_get_symbol_read_needs (sub_expr,
-						  baton.per_cu,
-						  baton.per_objfile,
-						  gdbarch_byte_order (arch),
-						  baton.per_cu->addr_size (),
-						  baton.per_cu->ref_addr_size (),
-						  depth);
-	      }
-	    break;
+	    insert_in_ops_to_visit (op_ptr + offset);
+	    insert_in_ops_to_visit (op_ptr);
+	    continue;
 	  }
 
+	case DW_OP_call2:
 	case DW_OP_call4:
 	  {
+	    unsigned int len = op == DW_OP_call2 ? 2 : 4;
 	    cu_offset cu_off
-	      = (cu_offset) extract_unsigned_integer (op_ptr, 4, byte_order);
-	    op_ptr += 4;
+	      = (cu_offset) extract_unsigned_integer (op_ptr, len, byte_order);
+	    op_ptr += len;
 
 	    auto get_frame_pc = [&symbol_needs] ()
 	      {
@@ -3175,10 +3178,13 @@ dwarf2_get_symbol_read_needs (gdb::array_view<const gdb_byte> expr,
 
 	case DW_OP_const_type:
 	case DW_OP_GNU_const_type:
-	  op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
-	  offset = *op_ptr++;
-	  op_ptr += offset;
-	  break;
+	  {
+	    uint64_t uoffset;
+	    op_ptr = safe_read_uleb128 (op_ptr, expr_end, &uoffset);
+	    gdb_byte offset = *op_ptr++;
+	    op_ptr += offset;
+	    break;
+	  }
 
 	default:
 	  error (_("Unhandled DWARF expression opcode 0x%x"), op);
@@ -3189,9 +3195,7 @@ dwarf2_get_symbol_read_needs (gdb::array_view<const gdb_byte> expr,
       if (symbol_needs == SYMBOL_NEEDS_FRAME)
 	break;
 
-      visited_op.insert (next_op.back ());
-      next_op.pop_back ();
-      next_op.push_back (op_ptr);
+      insert_in_ops_to_visit (op_ptr);
     }
 
   return symbol_needs;
-- 
2.31.1


  reply	other threads:[~2021-06-07 21:48 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28 15:46 [PATCH v3 00/17] DWARF expression evaluator design cleanup Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 01/17] Replace the symbol needs evaluator with a parser Zoran Zaric
2021-06-07 21:48   ` Simon Marchi [this message]
2021-06-11 17:22     ` Zaric, Zoran (Zare)
2021-05-28 15:46 ` [PATCH v3 02/17] Cleanup of the dwarf_expr_context constructor Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 03/17] Move frame context info to dwarf_expr_context Zoran Zaric
2021-06-09  0:45   ` Simon Marchi
2021-06-11 17:16     ` Zaric, Zoran (Zare)
2021-05-28 15:46 ` [PATCH v3 04/17] Remove get_frame_cfa from dwarf_expr_context Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 05/17] Move compilation unit info to dwarf_expr_context Zoran Zaric
2021-06-09  0:50   ` Simon Marchi
2021-06-11 17:19     ` Zaric, Zoran (Zare)
2021-05-28 15:46 ` [PATCH v3 06/17] Move dwarf_call " Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 07/17] Move get_object_address " Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 08/17] Move read_mem " Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 09/17] Move push_dwarf_reg_entry_value to expr.c Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 10/17] Inline get_reg_value method of dwarf_expr_context Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 11/17] Remove empty frame and full evaluators Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 12/17] Merge evaluate_for_locexpr_baton evaluator Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 13/17] Move piece_closure and its support to expr.c Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 14/17] Make value_copy also copy the stack data member Zoran Zaric
2021-05-28 15:46 ` [PATCH v3 15/17] Make DWARF evaluator return a single struct value Zoran Zaric
2021-08-11 17:00   ` Tom Tromey
2021-08-12 18:03     ` Tom Tromey
2021-08-13  9:57       ` Zoran Zaric
2021-08-13 16:59         ` Tom Tromey
2021-08-13 17:57           ` Zoran Zaric
2021-08-16 15:37             ` Tom Tromey
2021-08-16 16:05               ` Zoran Zaric
2021-08-16 17:32                 ` Tom Tromey
2021-05-28 15:46 ` [PATCH v3 16/17] Simplify dwarf_expr_context class interface Zoran Zaric
2021-06-09  1:01   ` Simon Marchi
2021-06-11 17:20     ` Zaric, Zoran (Zare)
2021-05-28 15:46 ` [PATCH v3 17/17] Add as_lval argument to expression evaluator Zoran Zaric
2021-06-09  1:04 ` [PATCH v3 00/17] DWARF expression evaluator design cleanup Simon Marchi
2021-08-05 16:38   ` Zaric, Zoran (Zare)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=395847ba-8138-018f-f618-528d1a22e9fa@polymtl.ca \
    --to=simon.marchi@polymtl.ca \
    --cc=gdb-patches@sourceware.org \
    --cc=zoran.zaric@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).